By Topic

Multimedia, IEEE Transactions on

Issue 1 • Date Jan. 2007

Filter Results

Displaying Results 1 - 25 of 27
  • Table of contents

    Page(s): C1 - C4
    Save to Project icon | Request Permissions | PDF file iconPDF (44 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Multimedia publication information

    Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (33 KB)  
    Freely Available from IEEE
  • Model-Based Power Aware Compression Algorithms for MPEG-4 Virtual Human Animation in Mobile Environments

    Page(s): 1 - 8
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1323 KB) |  | HTML iconHTML  

    MPEG-4 body animation parameters (BAP) are used for animation of MPEG-4 compliant virtual human-like characters. Distributed virtual reality applications and networked games on mobile computers require access to locally stored or streamed compressed BAP data. Existing MPEG-4 BAP compression techniques are inefficient for streaming, or storing, BAP data on mobile computers, because: 1) MPEG-4 compressed BAP data entails a significant number of CPU cycles, hence significant, unacceptable power consumption, for the purpose of decompression, 2) the lossy MPEG-4 technique of frame dropping to reduce network throughput during streaming leads to unacceptable animation degradation, and 3) lossy MPEG-4 compression does not exploit structural information in the virtual human model. In this article, we propose two novel algorithms for lossy compression of BAP data, termed as BAP-Indexing and BAP-Sparsing. We demonstrate how an efficient combination of the two algorithms results in a lower network bandwidth requirement and reduced power for data decompression at the client end when compared to MPEG-4 compression. The algorithm exploits the structural information in the virtual human model, thus maintaining visually acceptable quality of the resulting animation upon decompression. Consequently, the hybrid algorithm for BAP data compression is ideal for streaming of motion animation data to power- and network-constrained mobile computers View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Novel Point-Oriented Inner Searches for Fast Block Motion Estimation

    Page(s): 9 - 15
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (803 KB) |  | HTML iconHTML  

    Recently, an enhanced hexagon-based (EHS) search algorithm was proposed to speedup the original hexagon-based search (HS) using a 6-side-based fast inner search. However, this 6-side-based method is quite irregular by inspecting the distance between the inner search points and the coarse search points that would lower prediction accuracy. In this paper, a new point-oriented grouping strategy is proposed to develop fast inner search techniques for speeding up the HS and diamond search (DS) algorithms. Experimental results show that the new HS and DS using point-oriented inner searches are faster than their original algorithms up to 30% with negligible peak signal-to-noise ratio degradation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Render Sequence Encoding for Document Protection

    Page(s): 16 - 24
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (499 KB) |  | HTML iconHTML  

    We present in this paper a novel electronic document watermarking method, render sequence encoding (RSE), and then further develop a RSE authentication method for electronic documents. RSE watermarks an electronic document by modulating the display sequences of words or characters. It features large information-carrying capacity and robustness over document format transcoding. The RSE authentication method is based on the NP-complete exact traveling salesman problem, which provides a rigorous foundation for security. The RSE authentication method is secure in the sense it is extremely difficult to forge the authentication process. RSE authentication process is also easy to operate, especially in comparison to digital signatures which requires public key infrastructure for its operation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic Meeting Segmentation Using Dynamic Bayesian Networks

    Page(s): 25 - 36
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1301 KB) |  | HTML iconHTML  

    Multiparty meetings are a ubiquitous feature of organizations, and there are considerable economic benefits that would arise from their automatic analysis and structuring. In this paper, we are concerned with the segmentation and structuring of meetings (recorded using multiple cameras and microphones) into sequences of group meeting actions such as monologue, discussion and presentation. We outline four families of multimodal features based on speaker turns, lexical transcription, prosody, and visual motion that are extracted from the raw audio and video recordings. We relate these low-level features to more complex group behaviors using a multistream modelling framework based on multi-stream dynamic Bayesian networks (DBNs). This results in an effective approach to the segmentation problem, resulting in an action error rate of 12.2%, compared with 43% using an approach based on hidden Markov models. Moreover, the multistream DBN developed here leaves scope for many further improvements and extensions View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Two-Dimensional Channel Coding Scheme for MCTF-Based Scalable Video Coding

    Page(s): 37 - 45
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1245 KB) |  | HTML iconHTML  

    The motion-compensated temporal filtering (MCTF)-based scalable video coding (SVC) provides a full scalability including spatial, temporal and signal-to-noise ratio (SNR) scalability with fine granularity, each of which may result in different visual effect. This paper addresses a novel approach of two-dimensional unequal error protection (2D UEP) for the scalable video with a combined temporal and quality (SNR) scalability over packet-erasure channel. The bit-stream is divided into scalable subbitstreams based on the structure of MCTF. Each subbitstream is further divided into several quality layers. Unequal quantities of bits are allocated to protect different layers to obtain acceptable quality video with smooth degradation under different transmission error conditions. Experimental results are presented to show the advantage of the proposed 2D UEP scheme over the traditional one-dimensional unequal error protection (1D UEP) scheme. Comparing the proposed method with the 1D UEP scheme on SNR layers, our method gives up to 0.81-dB improvement for some video sequences View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Visual Salience-Guided Mesh Decomposition

    Page(s): 46 - 57
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1802 KB) |  | HTML iconHTML  

    In this paper, we propose a novel mesh-decomposition scheme called "visual salience-guided mesh decomposition". The concept of "part salience", which originated in cognitive psychology, asserts that the salience of a part can be determined by (at least) three factors: the protrusion, the boundary strength, and the relative size of the part. We try to convert these conceptual rules into real computational processes, and use them to guide a three-dimensional (3D) mesh decomposition process in such a way that the significant components can be precisely identified and efficiently extracted from a given 3D mesh. The proposed decomposition scheme not only identifies the parts' boundaries defined by the minima rule, but also labels each part with a quantitative degree of visual salience during the mesh decomposition process. The experimental results show that the proposed scheme is indeed effective and powerful in decomposing a 3D mesh into its significant components View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Real-Time Motion Trajectory-Based Indexing and Retrieval of Video Sequences

    Page(s): 58 - 65
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (572 KB) |  | HTML iconHTML  

    This paper presents a novel motion trajectory-based compact indexing and efficient retrieval mechanism for video sequences. Assuming trajectory information is already available, we represent trajectories as temporal ordering of subtrajectories. This approach solves the problem of trajectory representation when only partial trajectory information is available due to occlusion. It is achieved by a hypothesis testing-based method applied to curvature data computed from trajectories. The subtrajectories are then represented by their principal component analysis (PCA) coefficients for optimally compact representation. Different techniques are integrated to index and retrieve subtrajectories, including PCA, spectral clustering, and string matching. We assume a query by example mechanism where an example trajectory is presented to the system and the search system returns a ranked list of most similar items in the dataset. Experiments based on datasets obtained from University of California at Irvine's KDD archives and Columbia University's DVMM group demonstrate the superiority of our proposed PCA-based approaches in terms of indexing and retrieval times and precision recall ratios, when compared to other techniques in the literature View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Modeling and Mining of Users' Capture Intention for Home Videos

    Page(s): 66 - 77
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3156 KB) |  | HTML iconHTML  

    With the rapid adoption of consumer digital video recorders and an increase of home video data, content analysis has become an interesting and key research issue to provide personalized experiences and services for both camcorder users and viewers. In this paper, we present a novel view to tackle this issue, which aims at modeling and mining of the capture intention of camcorder users. Based on the study of intention mechanism in psychology, a set of domain-specific capture intention concepts is defined. A comprehensive and extensible scheme consisting of video structure decomposition, intention-oriented feature analysis, as well as singular-value-decomposition-based intention segmentation and learning-based intention classification is proposed to mine the users' capture intention. Experiments were carried on home video sequences of 90 h in total, taken by 16 persons over the past 20 years. Both the user study and objective evaluations indicate that our proposed intention-based approach is an effective complement to existing home video content analysis schemes View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Bayesian 3-D Search Engine Using Adaptive Views Clustering

    Page(s): 78 - 88
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1125 KB) |  | HTML iconHTML  

    In this paper, we propose a method for three-dimensional (3D)-model indexing based on two-dimensional (2D) views, which we call adaptive views clustering (AVC). The goal of this method is to provide an "optimal" selection of 2D views from a 3D model, and a probabilistic Bayesian method for 3D-model retrieval from these views. The characteristic view selection algorithm is based on an adaptive clustering algorithm and uses statistical model distribution scores to select the optimal number of views. Starting from the fact that all views do not have equal importance, we also introduce a novel Bayesian approach to improve the retrieval. Finally, we present our results and compare our method to some state-of-the-art 3D retrieval descriptors on the Princeton 3D Shape Benchmark database and a 3D-CAD-models database supplied by the car manufacturer Renault View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Major Cast Detection in Video Using Both Speaker and Face Information

    Page(s): 89 - 101
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1483 KB) |  | HTML iconHTML  

    Major casts, for example, the anchor persons or reporters in news broadcast programs and the principle characters in movies, play an important role in video, and their occurrences provide meaningful indices for organizing and presenting video content. This paper describes a new approach for automatically generating a list of major casts in a video sequence based on multiple modalities, specifically, speaker information in audio track and face information in video track. The core algorithm is composed of three steps. First, speaker boundaries are detected and speaker segments are clustered in audio stream. Second, face appearances are tracked and face tracks are clustered in video stream. Finally, correspondences between speakers and faces are determined based on their temporal co-occurrence. A list of major casts is constructed and ranked in an order that reflects each cast's importance, which is determined by the accumulative temporal and spatial presence of the cast. The proposed algorithm has been integrated in a major cast based video browsing system, which presents the face icon and marks the speech locations in time stream for each detected major cast. The system provides a semantically meaningful summary of the video content, which helps the user to effectively digest the theme of the video View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases

    Page(s): 102 - 119
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2317 KB) |  | HTML iconHTML  

    One of the challenges in the development of a content-based multimedia indexing and retrieval application is to achieve an efficient indexing scheme. The developers and users who are accustomed to making queries to retrieve a particular multimedia item from a large scale database can be frustrated by the long query times. Conventional indexing structures cannot usually cope with the requirements of a multimedia database, such as dynamic indexing or the presence of high-dimensional audiovisual features. Such structures do not scale well with the ever increasing size of multimedia databases whilst inducing corruption and resulting in an over-crowded indexing structure. This paper addresses such problems and presents a novel indexing technique, hierarchical cellular tree (HCT), which is designed to bring an effective solution especially for indexing large multimedia databases. Furthermore it provides an enhanced browsing capability, which enables user to make a guided tour within the database. A pre-emptive cell-search mechanism is introduced in order to prevent corruption, which may occur due to erroneous item insertions. Among the hierarchical levels that are built in a bottom-up fashion, similar items are collected into appropriate cellular structures at some level. Cells are subject to mitosis operations when the dissimilarity exceeds a required level. By mitosis operations, cells are kept focused and compact and yet, they can grow into any dimension as long as the compactness is maintained. The proposed indexing scheme is then used along with a recently introduced query method, the progressive query, in order to achieve the ultimate goal, from the user point of view that is retrieval of the most relevant items in the earliest possible time regardless of the database size. Experimental results show that the speed of retrievals is significantly improved and the indexing structure shows no sign of degradations when the database size is increased. Furthermore, HCT ind- exing body can conveniently be used for efficient browsing and navigation operations among the multimedia database items View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Edge Potential Functions (EPF) and Genetic Algorithms (GA) for Edge-Based Matching of Visual Objects

    Page(s): 120 - 135
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5141 KB) |  | HTML iconHTML  

    Edges are known to be a semantically rich representation of the contents of a digital image. Nevertheless, their use in practical applications is sometimes limited by computation and complexity constraints. In this paper, a new approach is presented that addresses the problem of matching visual objects in digital images by combining the concept of edge potential functions (EPF) with a powerful matching tool based on genetic algorithms (GAs). EPFs can be easily calculated starting from an edge map and provide a kind of attractive pattern for a matching contour, which is conveniently exploited by GAs. Several tests were performed in the framework of different image matching applications. The results achieved clearly outline the potential of the proposed method as compared to state of the art methodologies View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scene Parsing Using Region-Based Generative Models

    Page(s): 136 - 146
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (907 KB) |  | HTML iconHTML  

    Semantic scene classification is a challenging problem in computer vision. In contrast to the common approach of using low-level features computed from the whole scene, we propose "scene parsing" utilizing semantic object detectors (e.g., sky, foliage, and pavement) and region-based scene-configuration models. Because semantic detectors are faulty in practice, it is critical to develop a region-based generative model of outdoor scenes based on characteristic objects in the scene and spatial relationships between them. Since a fully connected scene configuration model is intractable, we chose to model pairwise relationships between regions and estimate scene probabilities using loopy belief propagation on a factor graph. We demonstrate the promise of this approach on a set of over 2000 outdoor photographs, comparing it with existing discriminative approaches and those using low-level features View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fragmental Proxy Caching for Streaming Multimedia Objects

    Page(s): 147 - 156
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (918 KB) |  | HTML iconHTML  

    In this paper, a fragmental proxy-caching scheme that efficiently manages the streaming multimedia data in proxy cache is proposed to improve the quality of streaming multimedia services. The novel data-fragmentation method in this scheme not only provides finer granularity caching units to allow more effective cache replacement, but also offers a unique and natural way of handling the interactive VCR functions in the proxy-caching environment. Furthermore, a cache-replacement scheme, based on user request arrival rates for different multimedia objects and the playback rates of these objects, is proposed to address the drawbacks in existing cache-replacement schemes, most of which consider only the user access frequencies in their cache-replacement decisions. In this cache-replacement scheme, a sliding history window is employed to monitor the dynamic user request arrivals, and a tunable-victimization procedure is used to provide an excellent method of managing the cached multimedia data in accordance with different quality-of-service requirements of the streaming multimedia applications. Performance studies demonstrate that the fragmental proxy-caching scheme significantly outperforms other caching schemes, in terms of byte-hit ratio and the number of delayed starts and can be tuned to either maximize the byte-hit ratio or minimize the number of delayed starts View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Quantum-Based Earliest Deadline First Scheduling for Multiservices

    Page(s): 157 - 168
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (714 KB) |  | HTML iconHTML  

    Latency-rate (LR) schedulers have shown their ability in providing fair and weighted sharing of bandwidth with an upper bound on delivery latency of packets while earliest departure first (EDF) schedulers have shown their ability in providing LR-decoupled service whereby the delivery latency of packets is not bounded by the reserved rate. However, EDF schedulers require traffic shapers to ensure flow protection. We propose quantum-based earliest deadline first scheduling (QEDF), a quantum-based scheduler that provides flow protection, throughput guarantee and delay bound guarantee for flows that require LR-coupled and LR-decoupled types of reservations. It classifies flows into time-critical (TC), jitter-sensitive (JS), and rate-based (RB) classes and uses a quality-of-service forwarding rule to determine the next packet to be serviced by the scheduler. It provides nonpreemptive priority service to TC queues. This allows LR-decoupled reservation for flows that have a low rate and intolerable delay. Packets from JS queues can be delayed by other packets if forwarding the latter will not result in the former missing its deadline. As a quantum-based scheduler, the QEDF scheduler provides throughput guarantees for RB queues. We present both analytical and simulation results of QEDF, whereby we evaluated QEDF in its deployment as a single-class as well as a multiservice scheduler View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast Bitstream Switching Algorithms for Real-Time Adaptive Video Multicasting

    Page(s): 169 - 175
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (642 KB) |  | HTML iconHTML  

    Bitstream switching among multiple bitstreams encoded at different bit rates is an effective way to address the bandwidth variation issue in transmitting multimedia over the Internet or wireless networks. This paper proposes two new fast real-time bitstream switching algorithms that aim to minimize the drifting error, while avoiding the problems of long delay, high complexity and bit-rate overhead for storage and transmission that often occur in prior solutions. The basic idea is to choose a switching point in a neighborhood with the highest encoding quality, within a switching window determined by the switching delay constraint. We show that they can significantly outperform a simple switching algorithm, and achieve performance that is closer to an offline mean-square-error-optimized bitstream switching solution, when compared to our previous work based on the similarity of the reference frames. The proposed schemes are especially useful in the scenario of real-time multicasting over dynamic heterogeneous networks, where multiple bitstreams with different bit rates are generated on the fly and dynamic bitstream switching is required for individual clients View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cycle-Based Rate Control for One-Way and Interactive Video Communications Over Wireless Channels

    Page(s): 176 - 184
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (856 KB) |  | HTML iconHTML  

    We propose a joint source-rate/channel-code control scheme for streaming video over a wireless channel. The scheme is designed to maximize the achievable source rate while guaranteeing an upper bound on the probability of starvation at the playback buffer. It can be applied to both one-way and interactive video communications. Rate control is performed adaptively on a per-cycle basis, where a cycle consists of a "good" channel period and the ensuing "bad" period. This cycle-based approach has two advantages. First, it reduces the fluctuations in the source bit rate, ensuring smooth variations in video quality. Second, it makes it possible to derive simple expressions for the starvation probability at the playback buffer, which we use to determine the optimal source rate and channel code for the good and bad periods of the subsequent cycle View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cross-Layer Packetization and Retransmission Strategies for Delay-Sensitive Wireless Multimedia Transmission

    Page(s): 185 - 197
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (897 KB) |  | HTML iconHTML  

    Existing wireless networks provide dynamically varying resources with only limited support for the quality of service required by the bandwidth-intense, loss-tolerant and delay-sensitive multimedia applications. This variability of resources does not significantly impact delay insensitive data transmission (e.g., file transfers), but has considerable consequences for multimedia applications. Recently, the research focus has been to adapt existing algorithms and protocols at the lower layers of the protocol stack to better support multimedia transmission applications and conversely, to modify application layer solutions to cope with the varying wireless networks resources. In this paper, we show that significant improvements in wireless multimedia performance can be obtained by deploying a joint application-layer adaptive packetization and prioritized scheduling and MAC-layer retransmission strategy. We deploy a state-of-the-art wavelet coder for the compression of the video data that enables on-the-fly adaptation to changing channel conditions and inherent prioritization of the video bitstream. We pose the cross-layer problem as a distortion minimization given delay constraints and derive analytical solutions by modifying existing joint source-channel coding theory aimed at fulfilling rate, rather than delay, constraints. We also propose real-time algorithms that explicitly consider the available information about previously transmitted packets. The obtained results show significant improvements in terms of video quality as opposed to ad-hoc optimizations currently deployed, while the complexity associated with performing this optimization in real time, i.e., at transmission time, is limited View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Real-Time Whiteboard Capture and Processing Using a Video Camera for Remote Collaboration

    Page(s): 198 - 206
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (719 KB) |  | HTML iconHTML  

    This paper describes our recently developed system which captures pen strokes on physical whiteboards in real time using an off-the-shelf video camera. Unlike many existing tools, our system does not instrument the pens or the whiteboard. It analyzes the sequence of captured video images in real time, classifies the pixels into whiteboard background, pen strokes and foreground objects (e.g., people in front of the whiteboard), extracts newly written pen strokes, and corrects the color to make the whiteboard completely white. This allows us to transmit whiteboard contents using very low bandwidth to remote meeting participants. Combined with other teleconferencing tools such as voice conference and application sharing, our system becomes a powerful tool to share ideas during online meetings View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Transactions on Multimedia EDICS

    Page(s): 207
    Save to Project icon | Request Permissions | PDF file iconPDF (14 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Multimedia information for authors

    Page(s): 208 - 209
    Save to Project icon | Request Permissions | PDF file iconPDF (46 KB)  
    Freely Available from IEEE
  • 2007 International Workshop on Multimedia Signal Processing (MMSP'07)

    Page(s): 210
    Save to Project icon | Request Permissions | PDF file iconPDF (137 KB)  
    Freely Available from IEEE
  • Special issue on new approaches to statistical speech and text processing

    Page(s): 211
    Save to Project icon | Request Permissions | PDF file iconPDF (156 KB)  
    Freely Available from IEEE

Aims & Scope

The scope of the Periodical is the various aspects of research in multimedia technology and applications of multimedia.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Chang Wen Chen
State University of New York at Buffalo