By Topic

Multimedia, IEEE Transactions on

Issue 3 • Date April 2008

Filter Results

Displaying Results 1 - 25 of 27
  • Table of contents

    Publication Year: 2008 , Page(s): C1 - C4
    Save to Project icon | Request Permissions | PDF file iconPDF (46 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Multimedia publication information

    Publication Year: 2008 , Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (32 KB)  
    Freely Available from IEEE
  • Compression of 3-D Point Visual Data Using Vector Quantization and Rate-Distortion Optimization

    Publication Year: 2008 , Page(s): 305 - 315
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4386 KB) |  | HTML iconHTML  

    In this paper, we propose adaptive and flexible quantization and compression algorithms for 3-D point data using vector quantization (VQ) and rate-distortion (R-D) optimization. The point data are composed of the position and the radius of sphere based on QSplat representation. The positions of child spheres are first transformed to the local coordinate system, which is determined by the parent-children relationship. The local coordinate transform makes the positions more compactly distributed in 3-D space, facilitating an effective application of VQ. We also develop a constrained encoding method for the radius data, which can provide a hole-free surface rendering at the decoder side. Furthermore, R-D optimized compression algorithm is proposed in order to allocate an optimal bitrate to each sphere. Experimental results show that the proposed algorithm can effectively compress the original 3-D point geometry at various bitrates. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Joined Spectral Trees for Scalable SPIHT-Based Multispectral Image Compression

    Publication Year: 2008 , Page(s): 316 - 329
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1716 KB) |  | HTML iconHTML  

    In this paper, the compression of multispectral images is addressed. Such 3-D data are characterized by a high correlation across the spectral components. The efficiency of the state-of-the-art wavelet-based coder 3-D SPIHT is considered. Although the 3-D SPIHT algorithm provides the obvious way to process a multispectral image as a volumetric block and, consequently, maintain the attractive properties exhibited in 2-D (excellent performance, low complexity, and embeddedness of the bit-stream), its 3-D trees structure is shown to be not adequately suited for 3-D wavelet transformed (DWT) multispectral images. The fact that each parent has eight children in the 3-D structure considerably increases the list of insignificant sets (LIS) and the list of insignificant pixels (LIP) since the partitioning of any set produces eight subsets which will be processed similarly during the sorting pass. Thus, a significant portion from the overall bit-budget is wastedly spent to sort insignificant information. Through an investigation based on results analysis, we demonstrate that a straightforward 2-D SPIHT technique, when suitably adjusted to maintain the rate scalability and carried out in the 3-D DWT domain, overcomes this weakness. In addition, a new SPIHT-based scalable multispectral image compression algorithm is used in the initial iterations to exploit the redundancies within each group of two consecutive spectral bands. Numerical experiments on a number of multispectral images have shown that the proposed scheme provides significant improvements over related works. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cryptanalysis of Some Multimedia Encryption Schemes

    Publication Year: 2008 , Page(s): 330 - 338
    Cited by:  Papers (30)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (474 KB) |  | HTML iconHTML  

    Encryption is one of the fundamental technologies that is used in digital rights management. Unlike ordinary computer applications, multimedia applications generate large amounts of data that has to be processed in real time. So, a number of encryption schemes for multimedia applications have been proposed in recent years. We analyze the following proposed methods for multimedia encryption: key-based multiple Huffman tables (MHT), arithmetic coding with key-based interval splitting (KSAC), and randomized arithmetic coding (RAC). Our analysis shows that MHT and KSAC are vulnerable to low complexity known- and/or chosen-plaintext attacks. Although we do not provide any attacks on RAC, we point out some disadvantages of RAC over the classical compress-then-encrypt approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Orthogonal Data Embedding for Binary Images in Morphological Transform Domain- A High-Capacity Approach

    Publication Year: 2008 , Page(s): 339 - 351
    Cited by:  Papers (11)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1351 KB) |  | HTML iconHTML  

    This paper proposes a data-hiding technique for binary images in morphological transform domain for authentication purpose. To achieve blind watermark extraction, it is difficult to use the detail coefficients directly as a location map to determine the data-hiding locations. Hence, we view flipping an edge pixel in binary images as shifting the edge location one pixel horizontally and vertically. Based on this observation, we propose an interlaced morphological binary wavelet transform to track the shifted edges, which thus facilitates blind watermark extraction and incorporation of cryptographic signature. Unlike existing block-based approach, in which the block size is constrained by 3times3 pixels or larger, we process an image in 2times2 pixel blocks. This allows flexibility in tracking the edges and also achieves low computational complexity. The two processing cases that flipping the candidates of one does not affect the flippability conditions of another are employed for orthogonal embedding, which renders more suitable candidates can be identified such that a larger capacity can be achieved. A novel effective Backward-Forward Minimization method is proposed, which considers both backwardly those neighboring processed embeddable candidates and forwardly those unprocessed flippable candidates that may be affected by flipping the current pixel. In this way, the total visual distortion can be minimized. Experimental results demonstrate the validity of our arguments. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Real-Time Vision and Speech Driven Avatars for Multimedia Applications

    Publication Year: 2008 , Page(s): 352 - 360
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1504 KB) |  | HTML iconHTML  

    Recent progress in advanced video communication services and multimedia applications is grounded on novel human machine interfaces, improved usability, and user friendliness driven by user centric research and development. In this paper, we describe a complete system concept and algorithmic details of an example application within this area. The key features of the system are vision and speech based interfaces, which are used to animate an avatar for an audio-visual representation of a communication partner. The system is applied in two application scenarios, namely video chat and customer care services. Both applications are mass-market oriented and therefore careful design and development of robust and supporting user interfaces are required. The presented approach is integrated into a complete real-time prototype system, which is permanently demonstrated in the showcase at the head quarter of Deutsche Telekom, Bonn, Germany. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • VCode—Pervasive Data Transfer Using Video Barcode

    Publication Year: 2008 , Page(s): 361 - 371
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1304 KB) |  | HTML iconHTML  

    In this paper, we describe a novel data transfer scheme that uses the camera in a smart phone as an alternative data channel. The data is encoded as a sequence of 2-D barcode images, displayed on a flat panel display, acquired by the camera, and decoded in real time by the software embedded in device. The decoded data is written to a file. Compared with existing data channels, such as CDMA/GPRS, cables, Bluetooth, and Infrared, our method relies on visual communication and does not require special hardware or data plans. Users only need to point the camera at a monitor displaying the VCode to download. Technical challenges to overcome include correction of perspective distortion, compensation for contrast variation, and efficient implementation of small footprint software into a mobile device. We address these challenges and present our solution in detail. We have implemented a prototype which allows users to download various types of files successfully, including pictures, ring tones and Java games onto camera phones running Symbian and Windows Mobile platforms. We discuss the limitations of our solution and outline future work to overcome these limitations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Video-Based Human Movement Analysis and Its Application to Surveillance Systems

    Publication Year: 2008 , Page(s): 372 - 384
    Cited by:  Papers (28)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1830 KB) |  | HTML iconHTML  

    This paper presents a novel posture classification system that analyzes human movements directly from video sequences. In the system, each sequence of movements is converted into a posture sequence. To better characterize a posture in a sequence, we triangulate it into triangular meshes, from which we extract two features: the skeleton feature and the centroid context feature. The first feature is used as a coarse representation of the subject, while the second is used to derive a finer description. We adopt a depth-first search (dfs) scheme to extract the skeletal features of a posture from the triangulation result. The proposed skeleton feature extraction scheme is more robust and efficient than conventional silhouette-based approaches. The skeletal features extracted in the first stage are used to extract the centroid context feature, which is a finer representation that can characterize the shape of a whole body or body parts. The two descriptors working together make human movement analysis a very efficient and accurate process because they generate a set of key postures from a movement sequence. The ordered key posture sequence is represented by a symbol string. Matching two arbitrary action sequences then becomes a symbol string matching problem. Our experiment results demonstrate that the proposed method is a robust, accurate, and powerful tool for human movement analysis. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Gossip-Based Computation of a Gaussian Mixture Model for Distributed Multimedia Indexing

    Publication Year: 2008 , Page(s): 385 - 392
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1049 KB) |  | HTML iconHTML  

    This paper deals with pattern recognition in a distributed computing context of the peer-to-peer type, that should be more and more interesting for multimedia data indexing and retrieval. Our goal is estimating of class-conditional probability densities, that take the form of Gaussian mixture models (GMM). Originally, we propagate GMMs in a decentralized fashion (gossip) in a network, and aggregate GMMs from various sources, through a technique that only involves little computation and that makes parsimonious usage of the network resource, as model parameters rather than data are transmitted. The aggregation is based on iterative optimization of an approximation of a KL divergence allowing closed-form computation between mixture models. Experimental results demonstrate the scheme to the case of speaker recognition. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Multimodal Scheme for Program Segmentation and Representation in Broadcast Video Streams

    Publication Year: 2008 , Page(s): 393 - 408
    Cited by:  Papers (14)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2319 KB) |  | HTML iconHTML  

    With the advance of digital video recording and playback systems, the request for efficiently managing recorded TV video programs is evident so that users can readily locate and browse their favorite programs. In this paper, we propose a multimodal scheme to segment and represent TV video streams. The scheme aims to recover the temporal and structural characteristics of TV programs with visual, auditory, and textual information. In terms of visual cues, we develop a novel concept named program-oriented informative images (POIM) to identify the candidate points correlated with the boundaries of individual programs. For audio cues, a multiscale Kullback-Leibler (K-L) distance is proposed to locate audio scene changes (ASC), and accordingly ASC is aligned with video scene changes to represent candidate boundaries of programs. In addition, latent semantic analysis (LSA) is adopted to calculate the textual content similarity (TCS) between shots to model the inter-program similarity and intra-program dissimilarity in terms of speech content. Finally, we fuse the multimodal features of POIM, ASC, and TCS to detect the boundaries of programs including individual commercials (spots). Towards effective program guide and attracting content browsing, we propose a multimodal representation of individual programs by using POIM images, key frames, and textual keywords in a summarization manner. Extensive experiments are carried out over an open benchmarking dataset TRECVID 2005 corpus and promising results have been achieved. Compared with the electronic program guide (EPG), our solution provides a more generic approach to determine the exact boundaries of diverse TV programs even including dramatic spots. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Batch Nearest Neighbor Search for Video Retrieval

    Publication Year: 2008 , Page(s): 409 - 420
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (810 KB) |  | HTML iconHTML  

    To retrieve similar videos to a query clip from a large database, each video is often represented by a sequence of high- dimensional feature vectors. Typically, given a query video containing m feature vectors, an independent nearest neighbor (NN) search for each feature vector is often first performed. After completing all the NN searches, an overall similarity is then computed, i.e., a single content-based video retrieval usually involves m individual NN searches. Since normally nearby feature vectors in a video are similar, a large number of expensive random disk accesses are expected to repeatedly occur, which crucially affects the overall query performance. Batch nearest neighbor (BNN) search is stated as a batch operation that performs a number of individual NN searches. This paper presents a novel approach towards efficient high-dimensional BNN search called dynamic query ordering (DQO) for advanced optimizations of both I/O and CPU costs. Observing the overlapped candidates (or search space) of a pervious query may help to further reduce the candidate sets of subsequent queries, DQO aims at progressively finding a query order such that the common candidates among queries are fully utilized to maximally reduce the total number of candidates. Modelling the candidate set relationship of queries by a candidate overlapping graph (COG), DQO iteratively selects the next query to be executed based on its estimated pruning power to the rest of queries with the dynamically updated COG. Extensive experiments are conducted on real video datasets and show the significance of our BNN query processing strategy. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Novel Framework for Semantic Annotation and Personalized Retrieval of Sports Video

    Publication Year: 2008 , Page(s): 421 - 436
    Cited by:  Papers (33)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1315 KB) |  | HTML iconHTML  

    Sports video annotation is important for sports video semantic analysis such as event detection and personalization. In this paper, we propose a novel approach for sports video semantic annotation and personalized retrieval. Different from the state of the art sports video analysis methods which heavily rely on audio/visual features, the proposed approach incorporates web-casting text into sports video analysis. Compared with previous approaches, the contributions of our approach include the following. 1) The event detection accuracy is significantly improved due to the incorporation of web-casting text analysis. 2) The proposed approach is able to detect exact event boundary and extract event semantics that are very difficult or impossible to be handled by previous approaches. 3) The proposed method is able to create personalized summary from both general and specific point of view related to particular game, event, player or team according to user's preference. We present the framework of our approach and details of text analysis, video analysis, text/video alignment, and personalized retrieval. The experimental results on event boundary detection in sports video are encouraging and comparable to the manually selected events. The evaluation on personalized retrieval is effective in helping meet users' expectations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Harmonizing Hierarchical Manifolds for Multimedia Document Semantics Understanding and Cross-Media Retrieval

    Publication Year: 2008 , Page(s): 437 - 446
    Cited by:  Papers (56)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (849 KB) |  | HTML iconHTML  

    In this paper, we consider the problem of multimedia document (MMD) semantics understanding and content-based cross-media retrieval. An MMD is a set of media objects of different modalities but carrying the same semantics and the content-based cross-media retrieval is a new kind of retrieval method by which the query examples and search results can be of different modalities. Two levels of manifolds are learned to explore the relationships among all the data in the level of MMD and in the level of media object respectively. We first construct a Laplacian media object space for media object representation of each modality and an MMD semantic graph to learn the MMD semantic correlations. The characteristics of media objects propagate along the MMD semantic graph and an MMD semantic space is constructed to perform cross-media retrieval. Different methods are proposed to utilize relevance feedback and experiment shows that the proposed approaches are effective. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Image Retrieval With Relevance Feedback Based on Graph-Theoretic Region Correspondence Estimation

    Publication Year: 2008 , Page(s): 447 - 456
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (977 KB) |  | HTML iconHTML  

    This paper presents a graph-theoretic approach for interactive region-based image retrieval. When dealing with image matching problems, we use graphs to represent images, transform the region correspondence estimation problem into an inexact graph matching problem, and propose an optimization technique to derive the solution. We then define the image distance in terms of the estimated region correspondence. In the relevance feedback steps, with the estimated region correspondence, we propose to use a maximum likelihood method to re-estimate the ideal query and the image distance measurement. Experimental results show that the proposed graph-theoretic image matching criterion outperforms the other methods incorporating no spatially adjacent relationship within images. Furthermore, our maximum likelihood method combined with the estimated region correspondence improves the retrieval performance in feedback steps. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Partitioning of Multiple Fine-Grained Scalable Video Sequences Concurrently Streamed to Heterogeneous Clients

    Publication Year: 2008 , Page(s): 457 - 469
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1057 KB) |  | HTML iconHTML  

    Fine-grained scalable (FGS) coding of video streams has been proposed in the literature to accommodate client heterogeneity. FGS streams are composed of two layers: a base layer, which provides basic quality, and a single enhancement layer that adds incremental quality refinements proportional to number of bits received. The base layer uses nonscalable coding which is more efficient in terms of compression ratio than scalable coding used in the enhancement layer. Thus for coding efficiency larger base layers are desired. Larger base layers, however, disqualify more clients from getting the stream. In this paper, we experimentally analyze this coding efficiency gap using diverse video sequences. For FGS sequences, we show that this gap is a non-increasing function of the base layer rate. We then formulate an optimization problem to determine the base layer rate of a single sequence to maximize the average quality for a given client bandwidth distribution. We design an optimal and efficient algorithm (called FGSOPT) to solve this problem. We extend our formulation to the multiple-sequence case, in which a bandwidth-limited server concurrently streams multiple FGS sequences to diverse sets of clients. We prove that this problem is NP-Complete. We design a branch-and-bound algorithm (called MFGSOPT) to compute the optimal solution. MFGSOPT runs fast for many typical cases because it intelligently cuts the search space. In the worst case, however, it has exponential time complexity. We also propose a heuristic algorithm (called MFGS) to solve the multiple-sequence problem. We experimentally show that MFGS produces near-optimal results and it scales to large problems: it terminates in less than 0.5 s for problems with more than 30 sequences. Therefore, MFGS can be used in dynamic systems, where the server periodically adjusts the structure of FGS streams to suit current client distributions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Admission Control Scheme Based on Online Measurement for VBR Video Streams Over Wireless Home Networks

    Publication Year: 2008 , Page(s): 470 - 479
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1045 KB) |  | HTML iconHTML  

    This paper presents an online measurement-based admission control scheme on the basis that the aggregate VBR video traffic is lognormally distributed. The proposed scheme consists of two components: measurement process and admission decision. The measurement process applies a linear Kalman filter to estimate statistical parameters of aggregate VBR video traffic. The estimated statistical parameters are used to calculate the effective bandwidth for admission decision. Variable bit rate (VBR) video traffic with high data rate is expected to occupy a dominant proportion of bandwidth for future wireless broadband home networks. To guarantee quality-of-service (QoS) of such VBR video streams, while achieving a high level of channel utilization, an efficient admission control scheme is urgently required, especially for emerging wireless multimedia indoor services, such as HDTV, online video game, etc. The proposed scheme is computationally efficient and accurate without much prior traffic information. Simulation results verify its effectiveness and show that it performs well for both a small number of connections and a large number of connections. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient Key Distribution for Access Control in Pay-TV Systems

    Publication Year: 2008 , Page(s): 480 - 492
    Cited by:  Papers (10)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (590 KB) |  | HTML iconHTML  

    The conditional access system (CAS) is an essential part of digital pay-TV systems for controlling access to the program services. Conventionally, due to the restrictions of bandwidth and computational capability, a CAS only supports period subscription services that are charged on a monthly basis. In this paper, based on the concept of hierarchical key assignment, we propose three key distribution schemes for the access control of pay-TV systems. With these schemes, a CAS can support more charging strategies for service providers, such as adopting a smaller charging unit and allowing a subscription of any subset of channels with little communication and computational overhead. In addition, the piracy management problem can also be dealt with easily. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Interpolation of Lost Speech Segments Using LP-HNM Model With Codebook Post-Processing

    Publication Year: 2008 , Page(s): 493 - 502
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (632 KB) |  | HTML iconHTML  

    This paper presents a method for interpolation of lost speech segments. The interpolation method can be used for packet loss concealment in voice communication over mobile phones, for voice over IP or for restoration of lost segments in speech recordings. The interpolation method employs a combination of a linear prediction (LP) model of the spectral envelope and a harmonic noise model (HNM) of the excitation of speech. The speech interpolation problem is transformed to the modeling and interpolation of the trajectories of LP parameters and the amplitude, phase and harmonicity of HNM tracks of speech excitation. In particular, the interpolation of harmonicity results in a smooth transition from voiced to unvoiced speech and vice versa. Crucially, the proposed interpolation method does not suffer from the consequences of zero-excitation of conventional autoregressive (AR) interpolation. Different combinations of linear and autoregressive interpolation methods are evaluated for the estimation of the time-varying parameters of LP-HNM tracks. Furthermore, a post-processing codebook mapping, employed to enhance the interpolation of the spectral envelope of speech, results in improved output quality for longer length speech gaps. For different packet loss rates and patterns of distributions of missing speech gaps, the proposed interpolation methods are evaluated and compared with popular AR-based interpolation methods and the speech packet recovery method specified in the ITU G.711 standard, as a reference. The evaluation results show that the proposed methods substantially improve the restoration of formants and harmonic tracks and consistently results in significant performance gain and improved perceptual quality of speech. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scalable Joint Source and Channel Coding of Meshes

    Publication Year: 2008 , Page(s): 503 - 513
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1311 KB) |  | HTML iconHTML  

    This paper proposes a new approach for joint source and channel coding (JSCC) of meshes, simultaneously providing scalability and optimized resilience against transmission errors. An unequal error protection approach is followed, to cope with the different error-sensitivity levels characterizing the various resolution and quality layers produced by the input scalable source codec. The number of layers and the protection levels to be employed for each layer are determined by solving a joint source and channel coding problem. In this context, a novel fast algorithm for solving the optimization problem is conceived, enabling a real-time implementation of the JSCC rate-allocation. An instantiation of the proposed JSCC approach is demonstrated for MeshGrid, which is a scalable 3-D object representation method, part of MPEG-4 AFX. In this context, the L-inflnite distortion metric is employed, which is to our knowledge a unique feature in mesh coding. Numerical results show the superiority of the L-inflnite norm over the classical L-2 norm in a JSCC setting. One concludes that the proposed joint source and channel coding approach offers resilience against transmission errors, provides graceful degradation, enables a fast real-time implementation, and preserves all the scalability features and animation capabilities of the employed scalable mesh codec. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Format-Independent Rich Media Delivery Using the Bitstream Binding Language

    Publication Year: 2008 , Page(s): 514 - 522
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1230 KB) |  | HTML iconHTML  

    Several recent standards address virtual containers for rich multimedia content: collections of media with metadata describing the relationships between them and providing an immersive user experience. While these standards - which include MPEG-21 and TVAnytime - provide numerous tools for interacting with rich media objects, they do not provide a framework for streaming or delivery of such content. This paper presents the bitstream binding language (BBL), a format-independent tool that describes how multimedia content and metadata may be bound into delivery formats. Using a BBL description, a generic processor can map rich content (an MPEG-21 digital item, for example) into a streaming or static delivery format. BBL provides a universal syntax for fragmentation and packetization of both XML and binary data, and allows new content and metadata formats to be delivered without requiring the addition of new software to the delivery infrastructure. Following its development by the authors, BBL was adopted by MPEG as Part 18 of the MPEG-21 Multimedia Framework. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient Multimedia Distribution in Source Constraint Networks

    Publication Year: 2008 , Page(s): 523 - 537
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1527 KB) |  | HTML iconHTML  

    In recent years, the number of peer-to-peer (P2P) applications has increased significantly. One important problem in many P2P applications is how to efficiently disseminate data from a single source to multiple receivers on the Internet. A successful model used for analyzing this problem is a graph consisting of nodes and edges, with a capacity assigned to each edge. In some situations however, it is inconvenient to use this model. To that end, we propose to study the problem of efficient data dissemination in a source constraint network. A source constraint network is modeled as a graph in which, the capacity is associated with a node, rather than an edge. The contributions of this paper include (a) a quantitative data dissemination in any source constraint network, (b) a set of topologies suitable for data dissemination in P2P networks, and (c) an architecture and implementation of a P2P system based on the proposed optimal topologies. We will present the experimental results of our P2P system deployed on PlanetLab nodes demonstrating that our approach achieves near optimal throughput while providing scalability, low delay and bandwidth fairness among peers. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Maximum Likelihood Sound Source Localization and Beamforming for Directional Microphone Arrays in Distributed Meetings

    Publication Year: 2008 , Page(s): 538 - 548
    Cited by:  Papers (23)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (735 KB) |  | HTML iconHTML  

    In distributed meeting applications, microphone arrays have been widely used to capture superior speech sound and perform speaker localization through sound source localization (SSL) and beamforming. This paper presents a unified maximum likelihood framework of these two techniques, and demonstrates how such a framework can be adapted to create efficient SSL and beamforming algorithms for reverberant rooms and unknown directional patterns of microphones. The proposed method is closely related to steered response power-based algorithms, which are known to work extremely well in real-world environments. We demonstrate the effectiveness of the proposed method on challenging synthetic and real-world datasets, including over six hours of recorded meetings. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Decoder Initializing Technique for Improving Frame-Erasure Resilience of a CELP Speech Codec

    Publication Year: 2008 , Page(s): 549 - 553
    Cited by:  Papers (3)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (220 KB) |  | HTML iconHTML  

    The authors present and evaluate a technique for synchronizing the internal states of a code-excited-linear-prediction (CELP) encoder and decoder after the occurrence of frame erasure. The designed technique, called ldquoduplicated transmission (DT),rdquo uses some redundant information for realizing synchronization. The encoder performs encoding processes twice and sends two codes for each frame. One code is encoded by an encoder that is initialized. The code is used in cases where the previous frame is erased. An onset detector is combined with the DT technique to select the frames to which the DT should be applied. Subjective test results suggest that, by introducing DT selectively, the number of DT frames is reducible by about 80% without degrading the subjective quality. Results demonstrate that synchronization of the internal states is effective in cases of erasure of onset. The DT technique requires no additional algorithmic delay. For that reason, it would a better choice for particular applications for which the delay has a significant impact. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Transactions on Multimedia Edics

    Publication Year: 2008 , Page(s): 554
    Save to Project icon | Request Permissions | PDF file iconPDF (13 KB)  
    Freely Available from IEEE

Aims & Scope

The scope of the Periodical is the various aspects of research in multimedia technology and applications of multimedia.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Chang Wen Chen
State University of New York at Buffalo