By Topic

Multimedia, IEEE Transactions on

Issue 8 • Date Dec. 2008

Filter Results

Displaying Results 1 - 25 of 32
  • Table of contents

    Page(s): C1 - C4
    Save to Project icon | Request Permissions | PDF file iconPDF (46 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Multimedia publication information

    Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (35 KB)  
    Freely Available from IEEE
  • A Constrained Probabilistic Petri Net Framework for Human Activity Detection in Video*

    Page(s): 1429 - 1443
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1590 KB) |  | HTML iconHTML  

    Recognition of human activities in restricted settings such as airports, parking lots and banks is of significant interest in security and automated surveillance systems. In such settings, data is usually in the form of surveillance videos with wide variation in quality and granularity. Interpretation and identification of human activities requires an activity model that a) is rich enough to handle complex multi-agent interactions, b) is robust to uncertainty in low-level processing and c) can handle ambiguities in the unfolding of activities. We present a computational framework for human activity representation based on Petri nets. We propose an extension—Probabilistic Petri Nets (PPN)—and show how this model is well suited to address each of the above requirements in a wide variety of settings. We then focus on answering two types of questions: (i) what are the minimal sub-videos in which a given activity is identified with a probability above a certain threshold and (ii) for a given video, which activity from a given set occurred with the highest probability? We provide the PPN-MPS algorithm for the first problem, as well as two different algorithms (naive PPN-MPA and PPN-MPA) to solve the second. Our experimental results on a dataset consisting of bank surveillance videos and an unconstrained TSA tarmac surveillance dataset show that our algorithms are both fast and provide high quality results. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Modeling and Optimization of Meta-Caching Assisted Transcoding

    Page(s): 1444 - 1454
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1321 KB) |  | HTML iconHTML  

    The increase of aggregate Internet bandwidth and the rapid development of 3G wireless networks demand efficient delivery of multimedia objects to all types of wireless devices. To handle requests from wireless devices at runtime, the transcoding-enabled caching proxy has been proposed to save transcoded versions to reduce the intensive computing demanded by online transcoding. Constrained by available CPU and storage, existing transcoding-enabled caching schemes always selectively cache certain transcoded versions, expecting that many future requests can be served from the cache. But such schemes treat the transcoder as a black box, leaving no room for flexible control of joint resource management between CPU and storage. In this paper, we first introduce the idea of meta-caching by looking into a transcoding procedure. Instead of caching certain selected transcoded versions in full, meta-caching identifies intermediate transcoding steps from which certain intermediate results (called metadata) can be cached so that a fully transcoded version can be easily produced from the metadata with a small amount of CPU cycles. Achieving big saving in caching space with possibly small sacrifice on CPU load, the proposed meta-caching scheme provides a unique method to balance the utilization of CPU and storage resources at the proxy. We further construct a model to analyze the meta-caching scheme. Based on the analysis, we propose AMTrac, Adaptive Meta-caching for Transcoding, which adaptively applies meta-caching based on the client request patterns and available resources. Experimental results show that AMTrac can significantly improve the system throughput over existing approaches. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Joint Video Coding and Statistical Multiplexing for Broadcasting Over DVB-H Channels

    Page(s): 1455 - 1464
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (650 KB) |  | HTML iconHTML  

    A novel joint video encoding and statistical multiplexing (StatMux) method for broadcasting over digital video broadcasting for handhelds (DVB-H) channels is proposed to improve the quality of encoded video and to decrease the end-to-end delay in a broadcast system. The main parts of end-to-end delay in a DVB-H system result from a time-sliced transmission scheme that is used in DVB-H and from the bit rate variations of service bit streams. The time-sliced transmission scheme is utilized in DVB-H to reduce the power consumption of DVB-H receivers. Variable bit rate (VBR) video bit streams are used in DVB-H to improve the video quality and compression performance. The time-sliced transmission scheme has increased the channel switching delay, i.e., switching to a new audio-visual service, in DVB-H. The used VBR bit streams increase the required buffering delays in the whole system. The different parts of end-to-end delay in a DVB-H system can be affected by the used video encoding and multiplexing methods. Different scenarios for encoding and StatMux of video sources for DVB-H application are studied in this paper. Moreover, a new method for jointly encoding and StatMux of video sources is proposed that not only decreases the end-to-end delay but also improves the average quality of compressed video by dynamically distributing available bandwidth between the video sources according to their relative complexity. Performance of the proposed method is validated by simulation results. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Video Capacity of WLANs With a Multiuser Perceptual Quality Constraint

    Page(s): 1465 - 1478
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (835 KB) |  | HTML iconHTML  

    As wireless local area networks (WLANs) become a part of our network infrastructure, it is critical that we understand both the performance provided to the end users and the capacity of these WLANs in terms of the number of supported flows (calls). Since it is clear that video traffic, as well as voice and data, will be carried by these networks, it is particularly important that we investigate these issues for packetized video. In this paper, we investigate the video user capacity of wireless networks subject to a multiuser perceptual quality constraint. As a particular example, we study the transmission of AVC/H.264 coded video streams over an IEEE 802.11a WLAN subject to a constraint on the quality of the delivered video experienced by r% (75%, for example) of the users of the WLAN. This work appears to be the first such effort to address this difficult but important problem. Furthermore, the methodology employed is perfectly general and can be used for different networks, video codecs, transmission channels, protocols, and perceptual quality measures. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Robust and Transparent Color Modulation for Text Data Hiding

    Page(s): 1479 - 1489
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1309 KB) |  | HTML iconHTML  

    This paper improves the use of text color modulation (TCM) as a reliable text document data hiding method. Using TCM, the characters in a document have their color components modified (possibly unperceptually) according to a side message to be embedded. This work presents a detection metric and an analysis determining the detection error rate in TCM, considering an assumed print and scan (PS) channel model. In addition, a perceptual impact model is employed to evaluate the perceptual difference between a modified and a non-modified character. Combining this perceptual model and the results from the detection error analysis it is possible to determine the optimum color modulation values. The proposed detection metric also exploits the orientation characteristics of color halftoning to reduce the error rate. In particular, because color halftoning algorithms use different screen orientation angles for each color channel, this is used as an effective feature to detect the embedded message. Experiments illustrate the validity of the analysis and the applicability of the method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fragile Watermarking With Error-Free Restoration Capability

    Page(s): 1490 - 1499
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2740 KB) |  | HTML iconHTML  

    This paper proposes a novel fragile watermarking scheme capable of perfectly recovering the original image from its tampered version. In the scheme, a tailor-made watermark consisting of reference-bits and check-bits is embedded into the host image using a lossless data hiding method. On the receiver side, by comparing the extracted and calculated check-bits, one can identify the tampered image-blocks. Then, the reliable reference-bits extracted from other blocks are used to exactly reconstruct the original image. Although content replacement may destroy a portion of the embedded watermark data, as long as the tampered area is not too extensive, the original image information can be restored without any error. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Difference Expansion Based Reversible Data Hiding Using Two Embedding Directions

    Page(s): 1500 - 1512
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1640 KB) |  | HTML iconHTML  

    Current difference-expansion (DE) embedding techniques perform one layer embedding in a difference image. They do not turn to the next difference image for another layer embedding unless the current difference image has no expandable differences left. The obvious disadvantage of these techniques is that image quality may have been severely degraded even before the later layer embedding begins because the previous layer embedding has used up all expandable differences, including those with large magnitude. Based on integer Haar wavelet transform, we propose a new DE embedding algorithm, which utilizes the horizontal as well as vertical difference images for data hiding. We introduce a dynamical expandable difference search and selection mechanism. This mechanism gives even chances to small differences in two difference images and effectively avoids the situation that the largest differences in the first difference image are used up while there is almost no chance to embed in small differences of the second difference image. We also present an improved histogram-based difference selection and shifting scheme, which refines our algorithm and makes it resilient to different types of images. Compared with current algorithms, the proposed algorithm often has better embedding capacity versus image quality performance. The advantage of our algorithm is more obvious near the embedding rate of 0.5 bpp. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Comprehensive Survey on Three-Dimensional Mesh Watermarking

    Page(s): 1513 - 1527
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1199 KB) |  | HTML iconHTML  

    Three-dimensional (3-D) meshes have been used more and more in industrial, medical and entertainment applications during the last decade. Many researchers, from both the academic and the industrial sectors, have become aware of their intellectual property protection and authentication problems arising with their increasing use. This paper gives a comprehensive survey on 3-D mesh watermarking, which is considered an effective solution to the above two emerging problems. Our survey covers an introduction to the relevant state of the art, an attack-centric investigation, and a list of existing problems and potential solutions. First, the particular difficulties encountered while applying watermarking on 3-D meshes are discussed. Then we give a presentation and an analysis of the existing algorithms by distinguishing them between fragile techniques and robust techniques. Since attacks play an important role in the design of 3-D mesh watermarking algorithms, we also provide an attack-centric viewpoint of this state of the art. Finally, some future working directions are pointed out especially on the ways of devising robust and blind algorithms and on some new probably promising watermarking feature spaces. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Discriminant Graph Structures for Facial Expression Recognition

    Page(s): 1528 - 1540
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2853 KB) |  | HTML iconHTML  

    In this paper, a series of advances in elastic graph matching for facial expression recognition are proposed. More specifically, a new technique for the selection of the most discriminant facial landmarks for every facial expression (discriminant expression-specific graphs) is applied. Furthermore, a novel kernel-based technique for discriminant feature extraction from graphs is presented. This feature extraction technique remedies some of the limitations of the typical kernel Fisher discriminant analysis (KFDA) which provides a subspace of very limited dimensionality (i.e., one or two dimensions) in two-class problems. The proposed methods have been applied to the Cohn-Kanade database in which very good performance has been achieved in a fully automatic manner. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Boosting-Based Multimodal Speaker Detection for Distributed Meeting Videos

    Page(s): 1541 - 1552
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1455 KB) |  | HTML iconHTML  

    Identifying the active speaker in a video of a distributed meeting can be very helpful for remote participants to understand the dynamics of the meeting. A straightforward application of such analysis is to stream a high resolution video of the speaker to the remote participants. In this paper, we present the challenges we met while designing a speaker detector for the Microsoft RoundTable distributed meeting device, and propose a novel boosting-based multimodal speaker detection (BMSD) algorithm. Instead of separately performing sound source localization (SSL) and multiperson detection (MPD) and subsequently fusing their individual results, the proposed algorithm fuses audio and visual information at feature level by using boosting to select features from a combined pool of both audio and visual features simultaneously. The result is a very accurate speaker detector with extremely high efficiency. In experiments that includes hundreds of real-world meetings, the proposed BMSD algorithm reduces the error rate of SSL-only approach by 24.6%, and the SSL and MPD fusion approach by 20.9%. To the best of our knowledge, this is the first real-time multimodal speaker detection algorithm that is deployed in commercial products. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Content-Aware Prediction Algorithm With Inter-View Mode Decision for Multiview Video Coding

    Page(s): 1553 - 1564
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1826 KB) |  | HTML iconHTML  

    3-D video will become one of the most significant video technologies in the next-generation television. Due to the ultra high data bandwidth requirement for 3-D video, effective compression technology becomes an essential part in the infrastructure. Thus multiview video coding (MVC) plays a critical role. However, MVC systems require much more memory bandwidth and computational complexity relative to mono-view video coding systems. Therefore, an efficient prediction scheme is necessary for encoding. In this paper, a new fast prediction algorithm, content-aware prediction algorithm (CAPA) with inter-view mode decision, is proposed. By utilizing disparity estimation (DE) to find corresponding blocks between different views, the coding information, such as rate-distortion cost, coding modes, and motion vectors, can be effectively shared and reused from the coded view channel. Therefore, the computation for motion estimation (ME) in most view channels can be greatly reduced. Experimental results show that compared with the full search block matching algorithm (FSBMA) applied to both ME and DE, the proposed algorithm saves 98.4-99.1% computational complexity of ME in most view channels with negligible quality loss of only 0.03-0.06 dB in PSNR. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Synthesis of Silhouettes and Visual Hull Reconstruction for Articulated Humans

    Page(s): 1565 - 1577
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (729 KB) |  | HTML iconHTML  

    In this paper, we propose a complete framework for improved synthesis and understanding of the human pose from a limited number of silhouette images. It combines the active image-based visual hull (IBVH) algorithm and a contour-based body part segmentation technique. We derive a simple, approximate algorithm to decide the extrinsic parameters of a virtual camera, and synthesize the turntable image collection of the person using the IBVH algorithm by actively moving the virtual camera on a properly computed circular trajectory around the person. Using the turning function distance as the silhouette similarity measurement, this approach can be used to generate the desired pose-normalized images for recognition applications. In order to overcome the inability of the visual hull (VH) method to reconstruct concave regions, we propose a contour-based human body part localization algorithm to segment the silhouette images into convex body parts. The body parts observed from the virtual view are generated separately from the corresponding body parts observed from the input views and then assembled together for a more accurate VH reconstruction. Furthermore, the obtained turntable image collection helps to improve the body part segmentation and identification process. By using the inner distance shape context (IDSC) measurement, we are able to estimate the body part locations more accurately from a synthesized view where we can localize the body part more precisely. Experiments show that the proposed algorithm can greatly improve body part segmentation and hence shape reconstruction results. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Spatiotemporal Motion Analysis for the Detection and Classification of Moving Targets

    Page(s): 1578 - 1591
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1732 KB) |  | HTML iconHTML  

    This paper presents a video surveillance system in the environment of a stationary camera that can extract moving targets from a video stream in real time and classify them into predefined categories according to their spatiotemporal properties. Targets are detected by computing the pixel-wise difference between consecutive frames, and then classified with a temporally boosted classifier and ldquospatiotemporal-oriented energyrdquo analysis. We demonstrate that the proposed classifier can successfully recognize five types of objects: a person, a bicycle, a motorcycle, a vehicle, and a person with an umbrella. In addition, we process targets that do not match any of the AdaBoost-based classifier's categories by using a secondary classification module that categorizes such targets as crowds of individuals or non-crowds. We show that the above classification task can be performed effectively by analyzing a target's spatiotemporal-oriented energies, which provide a rich description of the target's spatial and dynamic features. Our experiment results demonstrate that the proposed system is extremely effective in recognizing all predefined object classes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multilabel Neighborhood Propagation for Region-Based Image Retrieval

    Page(s): 1592 - 1604
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1446 KB) |  | HTML iconHTML  

    Content-based image retrieval (CBIR) has been an active research topic in the last decade. As one of the promising approaches, graph-based semi-supervised learning has attracted many researchers. However, while the related work mainly focused on global visual features, little attention has been paid to region-based image retrieval (RBIR). In this paper, a framework based on multilabel neighborhood propagation is proposed for RBIR, which can be characterized by three key properties: (1) For graph construction, in order to determine the edge weights robustly and automatically, mixture distribution is introduced into the Earth mover's distance (EMD) and a linear programming framework is involved. (2) Multiple low-level labels for each image can be obtained based on a generative model, and the correlations among different labels are explored when the labels are propagated simultaneously on the weighted graph. (3) By introducing multilayer semantic representation (MSR) and support vector machine (SVM) into the long-term learning, more exact weighted graph for label propagation and more meaningful high-level labels to describe the images can be calculated. Experimental results, including comparisons with the state-of-the-art retrieval systems, demonstrate the effectiveness of our proposal. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multi-Layer Multi-Instance Learning for Video Concept Detection

    Page(s): 1605 - 1616
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1003 KB) |  | HTML iconHTML  

    This paper presents a novel learning-based method, called ldquomulti-layer multi-instance (MLMI) learning,rdquo for video concept detection. Most of existing methods have treated video as a flat data sequence and have not investigated the intrinsic hierarchy structure of the video content deeply. However, video is essentially a kind of media with ML structure. For example, a video can be represented by a hierarchical structure including, from large to small, shot, frame, and region, where each pair of contiguous layers fits the typical MI setting. We call such a ML structure and the MI relations embedded in the structure as the MLMI setting. In this paper, we systematically study both ML structure and MI relations embedded in video content by formulating video concept detection as a MLMI learning problem. Specifically, we first construct a MLMI kernel to simultaneously model such ML structure and MI relations. To deal with the ambiguity propagation problem which is introduced by weak labeling and ML structure, we then propose a regularization framework which takes hyper-bag prediction error, sublayer prediction error, inter-layer inconsistency measure, and classifier complexity into consideration. We have applied the proposed MLMI learning method to concept detection task over TRECVid 2005 development corpus, and report better performance to vector-based and the state-of-the-art MI learning methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Mid-Level Representation for Melody-Based Retrieval in Audio Collections

    Page(s): 1617 - 1625
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (346 KB) |  | HTML iconHTML  

    Searching audio collections using high-level musical descriptors is a difficult problem, due to the lack of reliable methods for extracting melody, harmony, rhythm, and other such descriptors from unstructured audio signals. In this paper, we present a novel approach to melody-based retrieval in audio collections. Our approach supports audio, as well as symbolic queries and ranks results according to melodic similarity to the query. We introduce a beat-synchronous melodic representation consisting of salient melodic lines, which are extracted from the analyzed audio signal. We propose the use of a 2D shift-invariant transform to extract shift-invariant melodic fragments from the melodic representation and demonstrate how such fragments can be indexed and stored in a song database. An efficient search algorithm based on locality-sensitive hashing is used to perform retrieval according to similarity of melodic fragments. On the cover song detection task, good results are achieved for audio, as well as for symbolic queries, while fast retrieval performance makes the proposed system suitable for retrieval in large databases. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Query-by-Singing System for Retrieving Karaoke Music

    Page(s): 1626 - 1637
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1225 KB) |  | HTML iconHTML  

    This paper investigates the problem of retrieving karaoke music using query-by-singing techniques. Unlike regular CD music, where the stereo sound involves two audio channels that usually sound the same, karaoke music encompasses two distinct channels in each track: one is a mixture of the lead vocals and background accompaniment, and the other consists of accompaniment only. Although the two audio channels are distinct, the accompaniments in the two channels often resemble each other. We exploit this characteristic to: i) infer the background accompaniment for the lead vocals from the accompaniment-only channel, so that the main melody underlying the lead vocals can be extracted more effectively, and ii) detect phrase onsets based on the Bayesian information criterion (BIC) to predict the onset points of a song where a user's sung query may begin, so that the similarity between the melodies of the query and the song can be examined more efficiently. To further refine extraction of the main melody, we propose correcting potential errors in the estimated sung notes by exploiting a composition characteristic of popular songs whereby the sung notes within a verse or chorus section usually vary no more than two octaves. In addition, to facilitate an efficient and accurate search of a large music database, we employ multiple-pass dynamic time warping (DTW) combined with multiple-level data abstraction (MLDA) to compare the similarities of melodies. The results of experiments conducted on a karaoke database comprised of 1071 popular songs demonstrate the feasibility of query-by-singing retrieval for karaoke music. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Error Concealment for Frame Losses in MDC

    Page(s): 1638 - 1647
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (827 KB) |  | HTML iconHTML  

    Multiple description coding (MDC) is an effective error resilience (ER) technique for video coding. In case of frame loss, error concealment (EC) techniques can be used in MDC to reconstruct the lost frame, with error, from which subsequent frames can be decoded directly. With such direct decoding, the subsequent decoded frames will gradually recover from the frame loss, though slowly. In this paper we propose a novel algorithm using multihypothesis error concealment (MHC) to improve the error recovery rate of any EC in the temporal subsampling MDC. In MHC, the simultaneous temporal-interpolated frame is used as an additional hypothesis to improve the reconstructed video quality after the lost frame. Both subjective and objective results show that MHC can achieve significantly better video quality than direct decoding. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Joint Source-Channel Video Coding Scheme Based on Distributed Source Coding

    Page(s): 1648 - 1656
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (990 KB) |  | HTML iconHTML  

    Recently, several error resilient schemes have been proposed to tackle the error propagation problem in the motion-compensated predictive video coding based on a promising technique - distributed source coding (DSC). However, these schemes mainly apply the distributed source codes for channel error correction, while under-utilizing their capability for data compression. A channel-aware joint source-channel video coding scheme based on DSC is proposed to eliminate the error propagation problem in predictive video coding in a more efficient way. It is known that near Slepian-Wolf bound DSC is achieved using powerful channel codes, assuming the source and its reference (also known as side-information) are connected by a virtual error-prone channel. In the proposed scheme, the virtual and real error-prone channels are fused so that a unified single channel code is applied to encode the current frame thus accomplishing a joint source-channel coding. Our analysis of the rate efficiency in recovering error propagation shows that the joint scheme can achieve a lower rate compared with performing source and channel coding separately. Simulation results show that the number of bits used for recovering from error propagation can be reduced by up to 10% using the proposed scheme compared to Sehgal-Jagmohan-Ahuja's DSC-based error resilient scheme. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Variable Time Scale Multimedia Streaming Over IP Networks

    Page(s): 1657 - 1670
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (567 KB) |  | HTML iconHTML  

    This paper presents a comprehensive analysis of a variable time-scale streaming technique, VTSS, according to which rate changes are obtained by varying the inter-packet transmission interval, rather than altering, as in most cases, the source coding rate. Instead of constraining the transmitter to operate in real-time, the time scale of the packet scheduler can vary between zero, when the network is congested, to as faster than real-time as the channel bandwidth allows, when the network is lightly loaded. Although this approach is reportedly used in commercial streaming products, so far the technique has not yet been analyzed in a rigorous fashion, nor it has been compared to other state-of-the-art streaming techniques. This work first presents a theoretical analysis of the performance achievable by the VTSS approach, and it shows that, for the same channel conditions, VTSS yields a total distortion which is lower or, in the worst case, equal than the distortion of the standard real-time source-rate adaptive approach. A lower bound on receiver buffer size is also derived. Network simulations then analyze the performance of a TCP-friendly test implementation of VTSS compared with an ideal real-time source rate-adaptive technique, whose performance, being ideal, represents the upper bound of any transmission scheme based on source rate adaptation. The simulation results, also based on actual network traces, show that the VTSS approach delivers higher perceptual quality (up to 1.2 dB PSNR in the considered scenarios) and reduced video quality fluctuations (1.6 dB standard deviation PSNR, instead of 4.9 dB) for a wide range of standard video sequences. Perceptual quality evaluation by means of PVQM confirms such results. The gains, as expected, are even more pronounced (7.6 dB PSNR on average) if compared to real-time constant bit-rate video transmission. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multiple Distortion Measures for Packetized Scalable Media

    Page(s): 1671 - 1686
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (935 KB) |  | HTML iconHTML  

    As the diversity in end-user devices and networks grows, it becomes important to be able to efficiently and adaptively serve media content to different types of users. A key question surrounding adaptive media is how to do Rate-Distortion optimized scheduling. Typically, distortion is measured with a single distortion measure, such as the Mean-Squared Error compared to the original high resolution image or video sequence. Due to the growing diversity of users with varying capabilities such as different display sizes and resolutions, we introduce Multiple Distortion Measures (MDM) to account for a diverse range of users and target devices. MDM gives a clear framework with which to evaluate the performance of media systems which serve a variety of users. Scalable coders, such as JPEG2000 and H.264/MPEG-4 SVC, allow for adaptation to be performed with relatively low computational cost. We show that accounting for MDM can significantly improve system performance; furthermore, by combining this with scalable coding, this can be done efficiently. Given these MDM, we propose an algorithm to generate embedded schedules, which enables low-complexity, adaptive streaming of scalable media packets to minimize distortion across multiple users. We show that using MDM achieves up to 4 dB gains for spatial scalability applied to images and 12 dB gains for temporal scalability applied to video. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Joint Optimal Multipath Routing and Rate Control for Multidescription Coded Video Streaming in Ad Hoc Networks

    Page(s): 1687 - 1697
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (733 KB) |  | HTML iconHTML  

    This paper studies an important problem, namely, the joint multipath routing and rate control for multidescription coded (MD-coded) video streaming in wireless ad hoc networks. In addition to selecting a pair of paths to optimize the expected end-to-end video quality, we also explore an optimal packet skipping strategy for the rate control in order to minimize the impact of the skipped packets on the quality of the video. The R-D hint information, consisting of the size of the packets in bits and the importance of the packets for reconstructing the video, is used to characterize the packets in an R-D sense. Since searching for paths to minimize the expected end-to-end video distortion by simultaneously considering the skipped packets prior to the transmission and those dropped/delayed during the transmission is a highly complex problem and is expected to be NP-hard, we develop a heuristic greedy-relaxation-based routing solution that enables the system to efficiently select near-optimal paths. Extensive simulation studies have been conducted to compare the performance of the proposed algorithm with that of several existing algorithms, showing the superior performance of the proposed one. Such a joint rate control and multipath routing approach provides an important methodology for high-quality real-time video streaming applications over ad hoc wireless networks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Low-Delay Low-Complexity Bandwidth-Constrained Wireless Video Transmission Using SVC Over MIMO Systems

    Page(s): 1698 - 1707
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (886 KB) |  | HTML iconHTML  

    We propose an efficient strategy for the transmission of scalable video over multiple-input multiple-output (MIMO) wireless systems. In this paper, we use the latest scalable H.264 codec (SVC), which provides combined temporal, quality and spatial scalability. At the transmitter, we estimate the decoded video distortion for given channel conditions taking into account the effects of quantization, packet loss and error concealment. The proposed scalable decoder distortion algorithm offers low delay and low complexity. The performance of this method is validated using experimental results. In our proposed system, we use a MIMO system with orthogonal space-time block codes (O-STBC) that provides spatial diversity and guarantees independent transmission of different symbols within the block code. The bandwidth constrained allocation problem considered here is simplified and solved for one O-STBC symbol at a time. Furthermore, we take the advantage of the hierarchical structure of SVC to attain the optimal solution for each group of pictures (GOP) of the video sequence. We incorporate the estimated decoder distortion to optimally select the application layer parameter, i.e., quantization parameter (QP), and physical layer parameters, i.e., channel coding rate and modulation type for wireless video transmission. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

The scope of the Periodical is the various aspects of research in multimedia technology and applications of multimedia.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Chang Wen Chen
State University of New York at Buffalo