By Topic

Multimedia, IEEE Transactions on

Issue 1 • Date Jan. 2008

Filter Results

Displaying Results 1 - 21 of 21
  • Table of contents

    Page(s): C1 - C4
    Save to Project icon | Request Permissions | PDF file iconPDF (42 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Multimedia publication information

    Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (33 KB)  
    Freely Available from IEEE
  • Editorial

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (258 KB)  
    Freely Available from IEEE
  • Video Error Concealment Using Spatio-Temporal Boundary Matching and Partial Differential Equation

    Page(s): 2 - 15
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1569 KB) |  | HTML iconHTML  

    Error concealment techniques are very important for video communication since compressed video sequences may be corrupted or lost when transmitted over error-prone networks. In this paper, we propose a novel two-stage error concealment scheme for erroneously received video sequences. In the first stage, we propose a novel spatio-temporal boundary matching algorithm (STBMA) to reconstruct the lost motion vectors (MV). A well defined cost function is introduced which exploits both spatial and temporal smoothness properties of video signals. By minimizing the cost function, the MV of each lost macroblock (MB) is recovered and the corresponding reference MB in the reference frame is obtained using this MV. In the second stage, instead of directly copying the reference MB as the final recovered pixel values, we use a novel partial differential equation (PDE) based algorithm to refine the reconstruction. We minimize, in a weighted manner, the difference between the gradient field of the reconstructed MB in current frame and that of the reference MB in the reference frame under given boundary condition. A weighting factor is used to control the regulation level according to the local blockiness degree. With this algorithm, the annoying blocking artifacts are effectively reduced while the structures of the reference MB are well preserved. Compared with the error concealment feature implemented in the H.264 reference software, our algorithm is able to achieve significantly higher PSNR as well as better visual quality. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Paired Subimage Matching Watermarking Method on Ordered Dither Images and Its High-Quality Progressive Coding

    Page(s): 16 - 30
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5982 KB) |  | HTML iconHTML  

    In this paper, we present two novel robust methods for embedding watermarks into dithered halftone images. The first method is named paired subimage matching ordered dithering (PSMOD), of which the decoder is provided with a priori information of the original watermark, and the corresponding application is copyright protection. The other method, blind paired subimage matching ordered dithering (BPSMOD), does not require the knowledge of the original watermark, and the main application is secret communication. Both methods utilize the bit and sub-subimage interleaving preprocesses. The experiments show that both techniques are sufficiently robust to guard against the cropping, tampering, and print-and-scan degradation processes, in either B/W or color dithered images. Both techniques are also sufficiently flexible for various levels of embedded capacities. Furthermore, a novel progressive coding scheme is also presented in this paper for the efficient display of dithered images. After the preprocessing of bit-interleaving, this algorithm utilizes the characteristic of reordered image to determine the transmitting order and then progressively reconstructs the dithered image. Moreover, the dithered images are further compressed by lossy and lossless procedures. The experimental results demonstrate high-quality reconstructions while maintaining low transmitted bit rates. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Highly Efficient VLSI Architecture for H.264/AVC CAVLC Decoder

    Page(s): 31 - 42
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (641 KB) |  | HTML iconHTML  

    In this paper, an efficient algorithm is proposed to improve the decoding efficiency of the context-based adaptive variable length coding (CAVLC) procedure. Due to the data dependency among symbols in the decoding flow, the CAVLC decoder requires large computation time, which dominates the overall decoder system performance. To expedite its decoding speed, the critical path in the CAVLC decoder is first analyzed and then reduced by forwarding the adaptive detection for succeeding symbols. With a shortened critical path, the CAVLC architecture is further divided into two segments, which can be easily implemented by a pipeline structure. Consequently, the overall performance is effectively improved. In the hardware implementation, a low power combined LUT and single output buffer have been adopted to reduce the area as well as power consumption without affecting the decoding performance. Experimental results show that the proposed architecture surpassing other recent designs can approximately reduce power consumption by 40% and achieve three times decoding speed in comparison to the original decoding procedure suggested in the H.264 standard. The maximum frequency can be larger than 210 MHz, which can easily support the real-time requirement for resolutions higher than the HD1080 format. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Implementing the 2-D Wavelet Transform on SIMD-Enhanced General-Purpose Processors

    Page(s): 43 - 51
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1173 KB) |  | HTML iconHTML  

    The 2-D Discrete Wavelet Transform (DWT) consumes up to 68% of the JPEG2000 encoding time. In this paper, we develop efficient implementations of this important kernel on general-purpose processors (GPPs), in particular the Pentium 4 (P4). Efficient implementations of the 2-D DWT on the P4 must address three issues. First, the P4 suffers from a problem known as 64K aliasing, which can degrade performance by an order of magnitude. We propose two techniques to avoid 64K aliasing which improve performance by a factor of up to 4.20. Second, a straightforward implementation of vertical filtering incurs many cache misses. Cache performance can be improved by applying loop interchange, but there will still be many conflict misses if the filter length exceeds the cache associativity. Two methods are proposed to reduce the number of conflict misses which provide an additional performance improvement of up to 1.24. To show that these methods are general, results for the P3 and Opteron are also provided. Third, efficient implementations of the 2-D DWT must exploit the SIMD instructions supported by most GPPs, including the P4, and we present MMX and SSE implementations of horizontal and vertical filtering which provide a maximum speedup of 3.39 and 6.72, respectively. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Graphical Model for Context-Aware Visual Content Recommendation

    Page(s): 52 - 62
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (913 KB) |  | HTML iconHTML  

    Existing recommender systems provide an elegant solution to the information overload in current digital libraries such as the Internet archive. Nowadays, the sensors that capture the user's contextual information such as the location and time are become available and have raised a need to personalize recommendations for each user according to his/her changing needs in different contexts. In addition, visual documents have richer textual and visual information that was not exploited by existing recommender systems. In this paper, we propose a new framework for context-aware recommendation of visual documents by modeling the user needs, the context and also the visual document collection together in a unified model. We address also the user's need for diversified recommendations. Our pilot study showed the merits of our approach in content based image retrieval. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Extraction of Audio Features Specific to Speech Production for Multimodal Speaker Detection

    Page(s): 63 - 73
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (893 KB) |  | HTML iconHTML  

    A method that exploits an information theoretic framework to extract optimized audio features using video information is presented. A simple measure of mutual information (MI) between the resulting audio and video features allows the detection of the active speaker among different candidates. This method involves the optimization of an Mi-based objective function. No approximation is needed to solve this optimization problem, neither for the estimation of the probability density functions (pdfs) of the features, nor for the cost function itself. The pdfs are estimated from the samples using a nonparametric approach. The challenging optimization problem is solved using a global method: the differential evolution algorithm. Two information theoretic optimization criteria are compared and their ability to extract audio features specific to speech production is discussed. Using these specific audio features, candidate video features are then classified as member of the "speaker" or "non-speaker" class, resulting in a speaker detection scheme. As a result, our method achieves a speaker detection rate of 100% on in-house test sequences, and of 85% on most commonly used sequences. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Audio Keywords Discovery for Text-Like Audio Content Analysis and Retrieval

    Page(s): 74 - 85
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (928 KB) |  | HTML iconHTML  

    Inspired by classical text document analysis employing the concept of (key) words, this paper presents an unsupervised approach to discover (key) audio elements in general audio documents. The (key) audio elements can be considered the equivalents of the text (key) words, and enable content-based audio analysis and retrieval following the analogy to the proven text analysis theories and methods. Since general audio signals usually show complicated and strongly varying distribution and density in the feature space, we propose an iterative spectral clustering method with context-dependent scaling factors to decompose an audio data stream into audio elements. Using this clustering method, temporal signal segments with similar low-level features are grouped into natural clusters that we adopt as audio elements. To detect those audio elements that are most representative for the semantic content, that is, the key audio elements, two cases are considered. First, if only one audio document is available for analysis, a number of heuristic importance indicators are defined and employed to detect the key audio elements. For the case that multiple audio documents are available, more sophisticated measures for audio element importance, including expected term frequency (ETF), expected inverse document frequency (EIDF), expected term duration (ETD) and expected inverse document duration (EIDD), are proposed. Our experiments showed encouraging results regarding the quality of the obtained (key) audio elements and their potential applicability for content-based audio document analysis and retrieval. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Face Annotation Using Transductive Kernel Fisher Discriminant

    Page(s): 86 - 96
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1019 KB) |  | HTML iconHTML  

    Face annotation in images and videos enjoys many potential applications in multimedia information retrieval. Face annotation usually requires many training data labeled by hand in order to build effective classifiers. This is particularly challenging when annotating faces on large-scale collections of media data, in which huge labeling efforts would be very expensive. As a result, traditional supervised face annotation methods often suffer from insufficient training data. To attack this challenge, in this paper, we propose a novel Transductive Kernel Fisher Discriminant (TKFD) scheme for face annotation, which outperforms traditional supervised annotation methods with few training data. The main idea of our approach is to solve the Fisher's discriminant using deformed kernels incorporating the information of both labeled and unlabeled data. To evaluate the effectiveness of our method, we have conducted extensive experiments on three types of multimedia testbeds: the FRGC benchmark face dataset, the Yahoo! web image collection, and the TRECVID video data collection. The experimental results show that our TKFD algorithm is more effective than traditional supervised approaches, especially when there are very few training data. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Intra/Inter Macroblock Mode Decision for Error-Resilient Transcoding

    Page(s): 97 - 104
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (728 KB) |  | HTML iconHTML  

    When transmitting the precoded bitstream over an error-prone network, error-resilient transcoding is adopted to convert the bitstream to a resilient format for robust delivery. Intra refreshment is an efficient tool to reduce the dependency between frames and stop the channel error propagation. In the conventional scheme, the rate-distortion optimized macroblock mode decision is employed to adaptively determine the coding mode of each macroblock. However, this scheme only considers the channel error propagated from the previous frames to the current frame. As opposed to this traditional algorithm, this paper proposes a method which considers consecutive two frames in a sequence, thus taking the error propagation to the following frame into account. This enhances the overall robustness of the transcoded bitstream against the packet loss. Considering the availability of the next frame information, two cases are discussed respectively. Experimental results show that the proposed methods present quality improvement when compared with the conventional rate-distortion optimized error-resilient coding scheme under different test environments, and the PSNR improvement can reach as high as 0.9 dB. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Meet In the Middle Cross-Layer Adaptation for Audiovisual Content Delivery

    Page(s): 105 - 120
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1739 KB) |  | HTML iconHTML  

    This paper describes a new architecture and implementation of an adaptive streaming system (e.g., television over IP, video on demand) based on cross-layer interactions. At the center of the proposed architecture is the meet in the middle concept involving both bottom-up and top-down cross layer interactions. Each streaming session is entirely controlled at the RTP layer where we maintain a rich context that centralizes the collection of (i) instantaneous network conditions measured at the underlying layers (i.e.: link, network, and transport layers) and (ii) user- and terminal-triggered events that impose new real-time QoS adaptation strategies. Thus, each active multimedia session is tied to a broad range of parameters, which enable it to coordinate the QoS adaptation throughout the protocol layers and thus eliminating the overhead and preventing counter-productiveness among separate mechanisms implemented at different layers. The MPEG-21 framework is used to provide a common support for implementing and managing the end-to-end QoS of audio/video streams. Performance evaluations using peak signal to noise ratio (PSNR) and structural similarity index (SSIM) objective video quality metrics show the benefits of using the proposed Meet In the Middle cross-layer design compared to traditional media delivery approaches. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimal Coding of Multilayer and Multiversion Video Streams

    Page(s): 121 - 131
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (775 KB) |  | HTML iconHTML  

    Traditional video servers partially cope with heterogeneous client populations by maintaining a few versions of the same stream with different bit rates. More recent video servers leverage multilayer scalable coding techniques to customize the quality for individual clients. In both cases, heuristic, error-prone, techniques are currently used by administrators to determine either the rate of each stream version, or the granularity and rate of each layer in a multilayer scalable stream. In this paper, we propose an algorithm to determine the optimal rate and encoding granularity of each layer in a scalable video stream that maximizes a system-defined utility function for a given client distribution. The proposed algorithm can be used to compute the optimal rates of multiversion streams as well. Our algorithm is general in the sense that it can employ arbitrary utility functions for clients. We implement our algorithm and verify its optimality, and we show how various structuring of scalable video streams affect the client utilities. To demonstrate the generality of our algorithm, we consider three utility functions in our experiments. These utility functions model various aspects of streaming systems, including the effective rate received by clients, the mismatch between client bandwidth and received stream rate, and the client-perceived quality in terms of PSNR. We compare our algorithm against a heuristic algorithm that has been used before in the literature, and we show that our algorithm outperforms it in all cases. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Stochastic Optimization for Content Sharing in P2P Systems

    Page(s): 132 - 144
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1038 KB) |  | HTML iconHTML  

    Available resources in peer-to-peer (P2P) systems depend strongly on resource contributions made by individual peers. Empirical data shows that in the absence of incentives, a majority of the participating peers do not contribute resources. Modeling interactions between individual peers is often difficult as the number of peers in the system can be very large, and the relationships among them can be very complex. In this paper, we propose a new solution for P2P systems, where peers upload and download content to and from the contributing peers based on agreed-upon/determined sharing rates. We propose a P2P solution that deters free-riders by imposing constraints on participating peers; specifically, a peer is allowed access to new content only as long as its own content contribution exceeds an adaptively set threshold. The constraints are enforced either by a central authority (e.g., a tracker) or by a decentralized coalition of peers in a swarm, social network, etc. We derive optimal upload policies for the peers given their estimated future download requirements and their previous contribution (credit) to the other peers. Our results show considerable improvement in the cost-benefit tradeoff for peers that deploy such an optimal policy as compared to heuristic upload policies. We also propose mechanisms based on which the coalition of peers can provide incentives or penalties to participating peers to adjust their policies such that the availability of content and/or number of peers contributing content is maximized. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Content-Based Information Fusion for Semi-Supervised Music Genre Classification

    Page(s): 145 - 152
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2887 KB) |  | HTML iconHTML  

    In this paper, we propose an information fusion framework for the semi-supervised distance-based music genre classification problem. We make use of the regularized least-square framework as the basic classifier, which only involves the similarity scores among different music tracks. We present a similarity score that multiplies different scores based on different distance measures. Particularly the distance measures are not restricted to the Euclidean distance. By adding a weight to each single distance based score, we propose an expectation-maximization (EM) algorithm to adaptively learn the fusion scores. Experiments on real music data set show that our approach can give promising results. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance Analysis of Resource Selection Schemes for a Large Scale Video-on-Demand System

    Page(s): 153 - 159
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (318 KB) |  | HTML iconHTML  

    The designers of a large scale video-on-demand system face an optimization problem of deciding how to assign movies to multiple disks (servers) such that the request blocking probability is minimized subject to capacity constraints. To solve this problem, it is essential to develop scalable and accurate analytical means to evaluate the blocking performance of the system for a given file assignment. The performance analysis is made more complicated by the fact that the request blocking probability depends also on how disks are selected to serve user requests for multicopy movies. In this paper, we analyze several efficient resource selection schemes. Numerical results demonstrate that our analysis is scalable and sufficiently accurate to support the task of file assignment optimization in such a system. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Comments on "Scalable Services via Egress Admission Control

    Page(s): 160 - 161
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (99 KB) |  | HTML iconHTML  

    In the recent paper by Cetinkaya et al , admission control tests are derived by approximating the sum of two Gumbel distributed random variables by a Gumbel distributed random variable. In fact, the sum of two Gumbel distributed random variables does not follow a Gumbel distribution. Here, explicit expressions are derived for the probability density function (pdf) and the cumulative distribution function (cdf) of the exact distribution of the sum. The discrepancy between the exact and the approximate distributions is studied numerically. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Transactions on Multimedia Edics

    Page(s): 162
    Save to Project icon | Request Permissions | PDF file iconPDF (13 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Multimedia information for authors

    Page(s): 163 - 164
    Save to Project icon | Request Permissions | PDF file iconPDF (44 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Multimedia society information

    Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (25 KB)  
    Freely Available from IEEE

Aims & Scope

The scope of the Periodical is the various aspects of research in multimedia technology and applications of multimedia.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Chang Wen Chen
State University of New York at Buffalo