By Topic

Multimedia, IEEE Transactions on

Issue 4 • Date June 2009

Filter Results

Displaying Results 1 - 25 of 25
  • Table of contents

    Page(s): C1
    Save to Project icon | Request Permissions | PDF file iconPDF (48 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Multimedia publication information

    Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (36 KB)  
    Freely Available from IEEE
  • An Efficient Mode Selection Prior to the Actual Encoding for H.264/AVC Encoder

    Page(s): 581 - 588
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1904 KB) |  | HTML iconHTML  

    Many video compression algorithms require decisions to be made to select between different coding modes. In the case of H.264, this includes decisions about whether or not motion compensation is used, and the block size to be used for motion compensation. It has been proposed that constrained optimization techniques, such as the method of Lagrange multipliers, can be used to trade off between the quality of the compressed video and the bit rate generated. In this paper, we show that in many cases of practical interest, very similar results can be achieved with much simpler optimizations. Mode selection by simply minimizing the distortion with motion vectors and header information produces very similar performance to the full constrained optimization, while it reduces the mode selection and over all encoding time by 31% and 12%, respectively. The proposed approach can be applied together with fast motion search algorithms and the mode filtering algorithms for further speed up. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • High-Quality Mipmapping Texture Compression With Alpha Maps for Graphics Processing Units

    Page(s): 589 - 599
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2024 KB) |  | HTML iconHTML  

    Texture compression is an important technique in graphics processing units (GPUs) for saving memory bandwidth. This paper presents a high-quality mipmapping texture compression (MTC) system with alpha maps. Based upon the wavelet transform, a hierarchical approach is adopted for mipmapping textures in the YCbCr color space and alpha channel. By inspecting the similarity between the alpha and luminance channels, the two channels are efficiently encoded together with linear prediction in the differential mode. In addition, the split mode manages textures with no strong relationship between the alpha and luminance channels. A layer overlapping technique is also proposed to reduce the texture memory bandwidth. Simulation results show that MTC can reduce the texture access traffic by 80% to 90% and provides high image quality as well. Compared with DirectX texture compression (DXTC), the most well-known texture compression with alpha maps, MTC reduces the texture access bandwidth by 30% more. VLSI implementation results show that the hardware cost of MTC is similar to that of DXTC and that MTC is suitable for integration in GPUs to provide high-quality textures with low memory bandwidth requirements. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Expression-Invariant Face Recognition With Constrained Optical Flow Warping

    Page(s): 600 - 610
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3446 KB) |  | HTML iconHTML  

    Face recognition is one of the most intensively studied topics in computer vision and pattern recognition, but few are focused on how to robustly recognize expressional faces with one single training sample per class. In this paper, we modify the regularization-based optical flow algorithm by imposing constraints on some given point correspondences to compute precise pixel displacements and intensity variations. By using the optical flow computed for the input expression variant face with respect to a reference neutral face image, we remove the expression from the face image by elastic image warping to recognize the subject with facial expression. Experimental validation is given to show that the proposed expression normalization algorithm significantly improves the accuracy of face recognition on expression variant faces. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 3-D Face Detection, Landmark Localization, and Registration Using a Point Distribution Model

    Page(s): 611 - 623
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2365 KB) |  | HTML iconHTML  

    We present an accurate and robust framework for detecting and segmenting faces, localizing landmarks, and achieving fine registration of face meshes based on the fitting of a facial model. This model is based on a 3-D Point Distribution Model (PDM) that is fitted without relying on texture, pose, or orientation information. Fitting is initialized using candidate locations on the mesh, which are extracted from low-level curvature-based feature maps. Face detection is performed by classifying the transformations between model points and candidate vertices based on the upper-bound of the deviation of the parameters from the mean model. Landmark localization is performed on the segmented face by finding the transformation that minimizes the deviation of the model from the mean shape. Face registration is obtained using prior anthropometric knowledge and the localized landmarks. The performance of face detection is evaluated on a database of faces and non-face objects where we achieve an accuracy of 99.6%. We also demonstrate face detection and segmentation on objects with different scale and pose. The robustness of landmark localization is evaluated with noisy data and by varying the number of shapes and model points used in the model learning phase. Finally, face registration is compared with the traditional Iterative Closest Point (ICP) method and evaluated through a face retrieval and recognition framework on the GavabDB dataset, where we achieve a recognition rate of 87.4% and a retrieval rate of 83.9%. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Segmentation-Driven Image Fusion Based on Alpha-Stable Modeling of Wavelet Coefficients

    Page(s): 624 - 633
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3682 KB) |  | HTML iconHTML  

    A novel region-based image fusion framework based on multiscale image segmentation and statistical feature extraction is proposed. A dual-tree complex wavelet transform (DT-CWT) and a statistical region merging algorithm are used to produce a region map of the source images. The input images are partitioned into meaningful regions containing salient information via symmetric alpha-stable (S alphaS) distributions. The region features are then modeled using bivariate alpha-stable (B alphaS) distributions, and the statistical measure of similarity between corresponding regions of the source images is calculated as the Kullback-Leibler distance (KLD) between the estimated B alphaS models. Finally, a segmentation-driven approach is used to fuse the images, region by region, in the complex wavelet domain. A novel decision method is introduced by considering the local statistical properties within the regions, which significantly improves the reliability of the feature selection and fusion processes. Simulation results demonstrate that the bivariate alpha-stable model outperforms the univariate alpha-stable and generalized Gaussian densities by not only capturing the heavy-tailed behavior of the subband marginal distribution, but also the strong statistical dependencies between wavelet coefficients at different scales. The experiments show that our algorithm achieves better performance in comparison with previously proposed pixel and region-level fusion approaches in both subjective and objective evaluation tests. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The Rhombic Dodecahedron Map: An Efficient Scheme for Encoding Panoramic Video

    Page(s): 634 - 644
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1832 KB) |  | HTML iconHTML  

    Omnidirectional videos are usually mapped to planar domain for encoding with off-the-shelf video compression standards. However, existing work typically neglects the effect of the sphere-to-plane mapping. In this paper, we show that by carefully designing the mapping, we can improve the visual quality, stability and compression efficiency of encoding omnidirectional videos. Here we propose a novel mapping scheme, known as the rhombic dodecahedron map (RD map) to represent data over the spherical domain. By using a family of skew great circles as the subdivision kernel, the RD map not only produces a sampling pattern with very low discrepancy, it can also support a highly efficient data indexing mechanism over the spherical domain. Since the proposed map is quad-based, geodesic-aligned, and of very low area and shape distortion, we can reliably apply 2-D wavelet-based and DCT-based encoding methods that are originally designated to planar perspective videos. At the end, we perform a series of analysis and experiments to investigate and verify the effectiveness of the proposed method; with its ultra-fast data indexing capability, we show that we can playback omnidirectional videos with very high frame rates on conventional PCs with GPU support. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On the Design and Prototype Implementation of a Multimodal Situation Aware System

    Page(s): 645 - 657
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1814 KB) |  | HTML iconHTML  

    In this paper we describe the design concepts and prototype implementation of a situation aware ubiquitous computing system using multiple modalities such as National Marine Electronics Association (NMEA) data from Global Positioning System (GPS) receivers, text, speech, environmental audio, and handwriting inputs. While most mobile and communication devices know where and who they are, by accessing context information primarily in the form of location, time stamps, and user identity, the concept of sharing of this information in a reliable and intelligent fashion is crucial in many scenarios. A framework which takes the concept of context aware computing to the level of situation aware computing by intelligent information exchange between context aware devices is designed and implemented in this work. Four sensual modes of contextual information like text, speech, environmental audio, and handwriting are augmented to conventional contextual information sources like location from GPS, user identity based on IP addresses (IPA), and time stamps. Each device derives its context not necessarily using the same criteria or parameters but by employing selective fusion and fission of multiple modalities. The processing of each individual modality takes place at the client device followed by the summarization of context as a text file. Exchange of dynamic context information between devices is enabled in real time to create multimodal situation aware devices. A central repository of all user context profiles is also created to enable self-learning devices in the future. Based on the results of simulated situations and real field deployments it is shown that the use of multiple modalities like speech, environmental audio, and handwriting inputs along with conventional modalities can create devices with enhanced situational awareness. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Text-Like Segmentation of General Audio for Content-Based Retrieval

    Page(s): 658 - 669
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (728 KB) |  | HTML iconHTML  

    Automatic detection of (semantically) meaningful audio segments, or audio scenes, is an important step in high-level semantic inference from general audio signals, and can benefit various content-based applications involving both audio and multimodal (multimedia) data sets. Motivated by the known limitations of traditional low-level feature-based approaches, we propose in this paper a novel approach to discover audio scenes, based on an analysis of audio elements and key audio elements, which can be seen as equivalents to the words and keywords in a text document, respectively. In the proposed approach, an audio track is seen as a sequence of audio elements, and the presence of an audio scene boundary at a given time stamp is checked based on pair-wise measuring the semantic affinity between different parts of the analyzed audio stream surrounding that time stamp. Our proposed model for semantic affinity exploits the proven concepts from text document analysis, and is introduced here as a function of the distance between the audio parts considered, and the co-occurrence statistics and the importance weights of the audio elements contained therein. Experimental evaluation performed on a representative data set consisting of 5 h of diverse audio data streams indicated that the proposed approach is more effective than the traditional low-level feature-based approaches in solving the posed audio scene segmentation problem. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic Music Genre Classification Based on Modulation Spectral Analysis of Spectral and Cepstral Features

    Page(s): 670 - 682
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2137 KB) |  | HTML iconHTML  

    In this paper, we will propose an automatic music genre classification approach based on long-term modulation spectral analysis of spectral (OSC and MPEG-7 NASE) as well as cepstral (MFCC) features. Modulation spectral analysis of every feature value will generate a corresponding modulation spectrum and all the modulation spectra can be collected to form a modulation spectrogram which exhibits the time-varying or rhythmic information of music signals. Each modulation spectrum is then decomposed into several logarithmically-spaced modulation subbands. The modulation spectral contrast (MSC) and modulation spectral valley (MSV) are then computed from each modulation subband. Effective and compact features are generated from statistical aggregations of the MSCs and MSVs of all modulation subbands. An information fusion approach which integrates both feature level fusion method and decision level combination method is employed to improve the classification accuracy. Experiments conducted on two different music datasets have shown that our proposed approach can achieve higher classification accuracy than other approaches with the same experimental setup. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Recovering Connected Error Region Based on Adaptive Error Concealment Order Determination

    Page(s): 683 - 695
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (6694 KB) |  | HTML iconHTML  

    Parts of compressed video streams may be lost or corrupted when being transmitted over bandwidth limited networks and wireless communication networks with error-prone channels. Error concealment (EC) techniques are often adopted at the decoder side to improve the quality of the reconstructed video. Under the conditions of a high rate of data packets that arrives at the decoder corrupted, it is likely that the incorrectly decoded macro-blocks (MBs) are concentrated in a connected region, where important spatial reference information is lost. The conventional EC methods usually carry out the block concealment following a lexicographic scan (from top to bottom and from left to right of the image), which would make the methods ineffective for the case that the corrupted blocks are grouped in a connected region. In this paper, a temporal error concealment method, adaptive error concealment order determination (AECOD), is proposed to recover connected corrupted regions. The processing order of an MB in a connected corrupted region is adaptively determined by analyzing the external boundary patterns of the MBs in its neighborhood. The performances, on several video sequences, of the proposed EC scheme have been compared with those obtained by using other error concealment methods reported in the literature. Experimental results show that the AECOD algorithm can improve the recovery performance with respect to the other considered EC methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance Analysis for Overlay Multimedia Multicast on r -ary Tree and m -D Mesh Topologies

    Page(s): 696 - 706
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (782 KB) |  | HTML iconHTML  

    Without requiring multicast support from the underlying networks, overlay multicast has the advantage of implementing inter-domain multimedia multicast communications. Usually, overlay multicast protocols employ two different topologies: r-ary tree and m-D mesh. In this paper, we study the influence of topology selection on multimedia multicast performance. We present a set of theoretical results on the worst performance, the average performance, and the performance difference along the link stress, the number of overlay hops, and the number of shortest paths for r-ary tree-based and m-d mesh-based multicast, respectively. Furthermore, through simulations in NS2, we observe and compare tree and mesh topologies along the metrics analyzed theoretically. Simulation results match our theoretical analyses. Finally we give our evaluations of and insights into these two kinds of multicast when used to transmit multimedia streams. The selection of overlay topology is application dependent. To the best of our knowledge, this is the first evaluation of multimedia multicast performances in different overlay topologies. We believe that this study is useful for protocol design of target multimedia applications and for investigating multicast functions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Adaptive Borrow-and-Return Model for Broadcasting Videos

    Page(s): 707 - 715
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (797 KB) |  | HTML iconHTML  

    Yang proposed the concept of borrow-and-return (BR) to leverage the unused server bandwidth when a group of popular videos being broadcast with the FSFC (first segment on the first channel) broadcasting schemes in order to improve the mean waiting time (MWT) of the viewers with the help of additional receiving bandwidth available at the high-end clients. The BR model borrows the bandwidth of the videos with no new-coming viewers during a timeslot to speed up the transmission of the first segments of some of the remaining videos. In this paper, we first address the relative advantage issue among various possible BR schemes by developing a parametric generic BR (GBR) scheme controlled externally by independent borrow parameters. Later, we propose a new BR (NBR) model by incorporating an efficient transmission strategy to reduce the MWT further. Finally, an optimal NBR scheme is developed by augmenting with the optimal borrow parameters, which significantly outperforms the existing and new BR schemes in terms of overall MWT. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Proxy Caching for Video-on-Demand Using Flexible Starting Point Selection

    Page(s): 716 - 729
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1730 KB) |  | HTML iconHTML  

    In this paper, we propose a novel proxy caching scheme for video-on-demand (VoD) services. Our approach is based on the observation that streaming video users searching for some specific content or scene pay most attention to the initial delay, while a small shift of the starting point is acceptable. We present results from subjective VoD tests that relate waiting time and starting point deviation to user satisfaction. Based on this relationship as well as the dynamically changing popularity of video segments, we propose an efficient segment-based caching algorithm, which maximizes the user satisfaction by trading off between the initial delay and the deviation of starting point. Our caching scheme supports interactive video cassette recorder (VCR) functionalities and enables cache replacement with a much finer granularity compared to previously proposed segment-based approaches. Our experimental results show a significantly improved user satisfaction for our scheme compared to conventional caching schemes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Structured Network Coding and Cooperative Wireless Ad-Hoc Peer-to-Peer Repair for WWAN Video Broadcast

    Page(s): 730 - 741
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (923 KB) |  | HTML iconHTML  

    In a scenario where each peer of an ad-hoc wireless local area network (WLAN) receives one of many available video streams from a wireless wide area network (WWAN), we propose a network-coding-based cooperative repair framework for the ad-hoc peer group to improve broadcast video quality during channel losses. Specifically, we first impose network coding structures globally, and then select the appropriate video streams and network coding types within the structures locally, so that repair can be optimized for broadcast video in a rate-distortion manner. Innovative probability-the likelihood that a repair packet is useful in data recovery to a receiving peer-is analyzed in this setting for accurate optimization of the network codes. Our simulation results show that by using our framework, video quality can be improved by up to 19.71 dB over un-repaired video stream and by up to 5.39 dB over video stream using traditional unstructured network coding. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Delay Constraint Error Control Protocol for Real-Time Video Communication

    Page(s): 742 - 751
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (970 KB) |  | HTML iconHTML  

    Real-time video communication over wireless channels is subject to information loss since wireless links are error-prone and susceptible to noise. Popular wireless link-layer protocols, such as retransmission (ARQ) based 802.11 and hybrid ARQ methods provide some level of reliability while largely ignoring the latency issue which is critical for real-time applications. Therefore, they suffer from low throughput (under high-error rates) and large waiting-times leading to serious degradation of video playback quality. In this paper, we develop an analytical framework for video communication which captures the behavior of real-time video traffic at the wireless link-layer while taking into consideration both reliability and latency conditions. Using this framework, we introduce a delay constraint packet embedded error control (DC-PEEC) protocol for wireless link-layer. DC-PEEC ensures reliable and rapid delivery of video packets by employing various channel codes to minimize fluctuations in throughput and provide timely arrival of video. In addition to theoretically analyzing DC-PEEC, the performance of the proposed scheme is analyzed by simulating real-time video communication over ldquorealrdquo channel traces collected on 802.11 b WLANs using H.264/AVC JM14.0 video codec. The experimental results demonstrate performance gains of 5-10 dB for different real-time video scenarios. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Distributed Rate Allocation Policies for Multihomed Video Streaming Over Heterogeneous Access Networks

    Page(s): 752 - 764
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2020 KB) |  | HTML iconHTML  

    We consider the problem of rate allocation among multiple simultaneous video streams sharing multiple heterogeneous access networks. We develop and evaluate an analytical framework for optimal rate allocation based on observed available bit rate (ABR) and round-trip time (RTT) over each access network and video distortion-rate (DR) characteristics. The rate allocation is formulated as a convex optimization problem that minimizes the total expected distortion of all video streams. We present a distributed approximation of its solution and compare its performance against Hinfin-optimal control and two heuristic schemes based on TCP-style additive-increase-multiplicative-decrease (AIMD) principles. The various rate allocation schemes are evaluated in simulations of multiple high-definition (HD) video streams sharing multiple access networks. Our results demonstrate that, in comparison with heuristic AIMD-based schemes, both media-aware allocation and Hinfin-optimal control benefit from proactive congestion avoidance and reduce the average packet loss rate from 45% to below 2%. Improvement in average received video quality ranges between 1.5 to 10.7 dB in PSNR for various background traffic loads and video playout deadlines. Media-aware allocation further exploits its knowledge of the video DR characteristics to achieve a more balanced video quality among all streams. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Coalition-Based Resource Negotiation for Multimedia Applications in Informationally Decentralized Networks

    Page(s): 765 - 779
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1077 KB) |  | HTML iconHTML  

    Designing efficient and fair solutions for dividing the network resources in a distributed manner among self-interested multimedia users is recently becoming an important research topic because heterogeneous and high bandwidth multimedia applications (users), having different quality-of-service requirements, are sharing the same network. Suitable resource negotiation solutions need to explicitly consider the amount of information exchanged among the users and the computational complexity incurred by the users. In this paper, we propose decentralized solutions for resource negotiation, where multiple autonomous users self-organize into a coalition which shares the same network resources and negotiate the division of these resources by exchanging information about their requirements. We then discuss various resource sharing strategies that the users can deploy based on their exchanged information. Several of these strategies are designed to explicitly consider the utility (i.e., video quality) impact of multimedia applications. In order to quantify the utility benefit derived by exchanging different information, we define a new metric, which we refer to as the value of information. We quantify through simulations the improvements that can be achieved when various information is exchanged between users, and discuss the required complexity at the user side involved in implementing the various resource negotiation strategies. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Episode-Constrained Cross-Validation in Video Concept Retrieval

    Page(s): 780 - 785
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (653 KB)  

    Whereas video tells a narrative by a composition of shots, current video retrieval methods focus mainly on single shots. In retrieval performance estimation, similar shots in a narrative may result in performance overestimation. We propose an episode-based version of cross-validation leading up to 14% classification improvement over shot-based cross-validation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Service Adaptability in Multimedia Wireless Networks

    Page(s): 786 - 792
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (384 KB) |  | HTML iconHTML  

    Next-generation wireless communication systems aim at supporting wireless multimedia services with different quality-of-service (QoS) and bandwidth requirements. Therefore, effective management of the limited radio resources is important to enhance the network performance. In this paper, we propose a QoS adaptive multimedia service framework for controlling the traffic in multimedia wireless networks (MWN) that enhances the current methods used in cellular environments. The proposed framework is designed to take advantage of the adaptive bandwidth allocation (ABA) algorithm with new calls in order to enhance the system utilization and blocking probability of new calls. The performance of our framework is compared to existing framework in the literature. Simulation results show that our QoS adaptive multimedia service framework outperforms the existing framework in terms of new call blocking probability, handoff call dropping probability, and bandwidth utilization. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Transactions on Multimedia EDICS

    Page(s): 793
    Save to Project icon | Request Permissions | PDF file iconPDF (16 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Multimedia Information for authors

    Page(s): 794 - 795
    Save to Project icon | Request Permissions | PDF file iconPDF (46 KB)  
    Freely Available from IEEE
  • Special issue on Processing Reverberant Speech

    Page(s): 796
    Save to Project icon | Request Permissions | PDF file iconPDF (136 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Multimedia society information

    Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (23 KB)  
    Freely Available from IEEE

Aims & Scope

The scope of the Periodical is the various aspects of research in multimedia technology and applications of multimedia.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Chang Wen Chen
State University of New York at Buffalo