By Topic

Multimedia, IEEE Transactions on

Issue 4 • Date June 2014

Filter Results

Displaying Results 1 - 25 of 29
  • Table of contents

    Page(s): C1 - C4
    Save to Project icon | Request Permissions | PDF file iconPDF (157 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Multimedia publication information

    Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (132 KB)  
    Freely Available from IEEE
  • Cloud Mobile Media: Reflections and Outlook

    Page(s): 885 - 902
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (1326 KB) |  | HTML iconHTML  

    This paper surveys the emerging paradigm of cloud mobile media. We start with two alternative perspectives for cloud mobile media networks: an end-to-end view and a layered view. Summaries of existing research in this area are organized according to the layered service framework: i) cloud resource management and control in infrastructure-as-a-service (IaaS), ii) cloud-based media services in platform-as-a-service (PaaS), and iii) novel cloud-based systems and applications in software-as-a-service (SaaS). We further substantiate our proposed design principles for cloud-based mobile media using a concrete case study: a cloud-centric media platform (CCMP) developed at Nanyang Technological University. Finally, this paper concludes with an outlook of open research problems for realizing the vision of cloud-based mobile media. View full abstract»

    Open Access
  • Video Object Co-Segmentation via Subspace Clustering and Quadratic Pseudo-Boolean Optimization in an MRF Framework

    Page(s): 903 - 916
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3629 KB) |  | HTML iconHTML  

    Multiple videos may share a common foreground object, for instance a family member in home videos, or a leading role in various clips of a movie or TV series. In this paper, we present a novel method for co-segmenting the common foreground object from a group of video sequences. The issue was seldom touched on in the literature. Starting from over-segmentation of each video into Temporal Superpixels (TSPs), we first propose a new subspace clustering algorithm which segments the videos into consistent spatio-temporal regions with multiple classes, such that the common foreground has consistent labels across different videos. The subspace clustering algorithm exploits the fact that across different videos the common foreground shares similar appearance features, while motions can be used to better differentiate regions within each video, making accurate extraction of object boundaries easier. We further formulate video object co-segmentation as a Markov Random Field (MRF) model which imposes the constraint of foreground model automatically computed or specified with little user effort. The Quadratic Pseudo-Boolean Optimization (QPBO) is used to generate the results. Experiments show that this video co-segmentation framework can achieve good quality foreground extraction results without user interaction for those videos with unrelated background, and with only moderate user interaction for those videos with similar background. Comparisons with previous work also show the superiority of our approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Sport Type Classification of Mobile Videos

    Page(s): 917 - 932
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3366 KB) |  | HTML iconHTML  

    The recent proliferation of mobile video content has emphasized the need for applications such as automatic organization and automatic editing of videos. These applications could greatly benefit from domain knowledge about the content. However, extracting semantic information from mobile videos is a challenging task, due to their unconstrained nature. We extract domain knowledge about sport events recorded by multiple users, by classifying the sport type into soccer, American football, basketball, tennis, ice-hockey, or volleyball. We adopt a multi-user and multimodal approach, where each user simultaneously captures audio-visual content and auxiliary sensor data (from magnetometers and accelerometers). Firstly, each modality is separately analyzed; then, analysis results are fused for obtaining the sport type. The auxiliary sensor data is used for extracting more discriminative spatio-temporal visual features and efficient camera motion features. The contribution of each modality to the fusion process is adapted according to the quality of the input data. We performed extensive experiments on data collected at public sport events, showing the merits of using different combinations of modalities and fusion methods. The results indicate that analyzing multimodal and multi-user data, coupled with adaptive fusion, improves classification accuracies in most tested cases, up to 95.45%. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient H.264/AVC Video Coding with Adaptive Transforms

    Page(s): 933 - 946
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3688 KB) |  | HTML iconHTML  

    Transform has been widely used to remove spatial redundancy of prediction residuals in the modern video coding standards. However, since the residual blocks exhibit diverse characteristics in a video sequence, conventional transform methods with fixed transform kernels may result in low efficiency. To tackle this problem, we propose a novel content adaptive transform framework for the H.264/AVC-based video coding. The proposed method utilizes pixel rearrangement to dynamically adjust the transform kernels to adapt to the video content. In addition, unlike the traditional adaptive transforms, the proposed method obtains the transform kernels from the reconstructed block, and hence it consumes only one logic indicator for each transform unit. Moreover, a spiral-scanning method is developed to reorder the transform coefficients for better entropy coding. Experimental results on the Key Technical Area (KTA) platform show that the proposed method can achieve an average bitrate reduction of about 7.95% and 7.0% under all-intra and low-delay configurations, respectively. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An H.264 High-Profile Intra-Prediction with Adaptive Selection Between the Parallel and Pipelined Executions of Prediction Modes

    Page(s): 947 - 959
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1822 KB) |  | HTML iconHTML  

    A high-profile H.264 intra-frame encoder is suitable for low-cost and low-power applications and capable of providing enhanced compression efficiency. The high-profile is targeting the high-resolution videos. Thus, the encoding speed should be faster than or comparable to the baseline-profile. In previous work related to a hardware-based baseline-profile intra-frame encoder, a speed-up is achieved by the early termination of the intra modes and by an increase in the rate of hardware utilization only under one of the serialized and parallel schedules. This paper proposes a novel pipeline schedule for a hardware-based high-profile intra-prediction scheme in which the 8 × 8 prediction is performed in Stage 1 and 4 × 4, 16 ×16 and chroma predictions are executed during Stage 2. The processing time of Stage 2 is efficiently accelerated based on the result of the 8 × 8 prediction in Stage 1. According to the distribution of each mode, the schedule is adaptively selected between parallel and pipeline schedules. To increase the hardware utilization of the 8 × 8 prediction, the order of prediction modes and the inverse vertical transform is adaptively adjusted. In addition, early termination of the prediction modes is employed for a fast 8 × 8 prediction. The proposed 8 × 8 intra-prediction is implemented and verified as an entire intra-frame encoder. Experimental results show that the average number of cycles necessary to process one MB for videos with resolutions of 1920 ×1080 and 3840 × 2160 are only 269 and 253 cycles, respectively. Compared to JM13.2, the bitrate is increased by 1.13% on average with a small PSNR degradation of 0.06 dB. The difference in the rate-distortion performance between the proposed high-profile intra-prediction scheme and JM 13.2 is not significant, whereas the achieved speed-up due to the proposed schemes is considerable compared to the conventional hardware-based intra-pr- diction encoders. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Stationary Probability Model for Microscopic Parallelism in JPEG2000

    Page(s): 960 - 970
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1517 KB) |  | HTML iconHTML  

    Parallel processing is key to augmenting the throughput of image codecs. Despite numerous efforts to parallelize wavelet-based image coding systems, most attempts fail at the parallelization of the bitplane coding engine, which is the most computationally intensive stage of the coding pipeline. The main reason for this failure is the causality with which current coding strategies are devised, which assumes that one coefficient is coded after another. This work analyzes the mechanisms employed in bitplane coding and proposes alternatives to enhance opportunities for parallelism. We describe a stationary probability model that, without sacrificing the advantages of current approaches, removes the main obstacle to the parallelization of most coding strategies. Experimental tests evaluate the coding performance achieved by the proposed method in the framework of JPEG2000 when coding different types of images. Results indicate that the stationary probability model achieves similar coding performance, with slight increments or decrements depending on the image type and the desired level of parallelism. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Physical Metaphor for Streaming Media Retargeting

    Page(s): 971 - 979
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1524 KB) |  | HTML iconHTML  

    We here introduce an image/video retargeting method that operates arbitrary aspect ratios resizing achieved in real-time performances. Most of the literature retargeting approaches sacrifice real-time performances in behalf of quality. On the other hand, existing fast methods provide arguable results. We can obtain a valuable trade-off between effectiveness and efficiency. The method named Spring Simulation Retargeting (SSR) is mainly based on a physical springs-based simulation. The media are assumed as flexible objects composed of particles and springs with different local stiffness properties, related to the visual importance of the content. The variation of the object size generates elastic forces which determine a new arrangement of the particles, according to the elongation of their connected springs. The deformations are mostly introduced in regions where low importance content is present, while high saliency regions are preserved as desired. The proposed method is evaluated and compared both for images and videos, against several state of the art methods and a user study is taken to assess the results, showing the value of the approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Image Similarity Using Sparse Representation and Compression Distance

    Page(s): 980 - 987
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1223 KB) |  | HTML iconHTML  

    A new line of research uses compression methods to measure the similarity between signals. Two signals are considered similar if one can be compressed significantly when the information of the other is known. The existing compression-based similarity methods, although successful in the discrete one dimensional domain, do not work well in the context of images. This paper proposes a sparse representation-based approach to encode the information content of an image using information from the other image, and uses the compactness (sparsity) of the representation as a measure of its compressibility (how much can the image be compressed) with respect to the other image. The sparser the representation of an image, the better it can be compressed and the more it is similar to the other image. The efficacy of the proposed measure is demonstrated through the high accuracies achieved in image clustering, retrieval and classification. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Corruptive Artifacts Suppression for Example-Based Color Transfer

    Page(s): 988 - 999
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (3414 KB) |  | HTML iconHTML  

    Example-based color transfer is a critical operation in image editing but easily suffers from some corruptive artifacts in the mapping process. In this paper, we propose a novel unified color transfer framework with corruptive artifacts suppression, which performs iterative probabilistic color mapping with self-learning filtering scheme and multiscale detail manipulation scheme in minimizing the normalized Kullback-Leibler distance. First, an iterative probabilistic color mapping is applied to construct the mapping relationship between the reference and target images. Then, a self-learning filtering scheme is applied into the transfer process to prevent from artifacts and extract details. The transferred output and the extracted multi-levels details are integrated by the measurement minimization to yield the final result. Our framework achieves a sound grain suppression, color fidelity and detail appearance seamlessly. For demonstration, a series of objective and subjective measurements are used to evaluate the quality in color transfer. Finally, a few extended applications are implemented to show the applicability of this framework. View full abstract»

    Open Access
  • Variational Bayesian Methods For Multimedia Problems

    Page(s): 1000 - 1017
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3792 KB) |  | HTML iconHTML  

    In this paper we present an introduction to Variational Bayesian (VB) methods in the context of probabilistic graphical models, and discuss their application in multimedia related problems. VB is a family of deterministic probability distribution approximation procedures that offer distinct advantages over alternative approaches based on stochastic sampling and those providing only point estimates. VB inference is flexible to be applied in different practical problems, yet is broad enough to subsume as its special cases several alternative inference approaches including Maximum A Posteriori (MAP) and the Expectation-Maximization (EM) algorithm. In this paper we also show the connections between VB and other posterior approximation methods such as the marginalization-based Loopy Belief Propagation (LBP) and the Expectation Propagation (EP) algorithms. Specifically, both VB and EP are variational methods that minimize functionals based on the Kullback-Leibler (KL) divergence. LBP, traditionally developed using graphical models, can also be viewed as a VB inference procedure. We present several multimedia related applications illustrating the use and effectiveness of the VB algorithms discussed herein. We hope that by reading this tutorial the readers will obtain a general understanding of Bayesian methods and establish connections among popular algorithms used in practice. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hire me: Computational Inference of Hirability in Employment Interviews Based on Nonverbal Behavior

    Page(s): 1018 - 1031
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1945 KB) |  | HTML iconHTML  

    Understanding the basis on which recruiters form hirability impressions for a job applicant is a key issue in organizational psychology and can be addressed as a social computing problem. We approach the problem from a face-to-face, nonverbal perspective where behavioral feature extraction and inference are automated. This paper presents a computational framework for the automatic prediction of hirability. To this end, we collected an audio-visual dataset of real job interviews where candidates were applying for a marketing job. We automatically extracted audio and visual behavioral cues related to both the applicant and the interviewer. We then evaluated several regression methods for the prediction of hirability scores and showed the feasibility of conducting such a task, with ridge regression explaining 36.2% of the variance. Feature groups were analyzed, and two main groups of behavioral cues were predictive of hirability: applicant audio features and interviewer visual cues, showing the predictive validity of cues related not only to the applicant, but also to the interviewer. As a last step, we analyzed the predictive validity of psychometric questionnaires often used in the personnel selection process, and found that these questionnaires were unable to predict hirability, suggesting that hirability impressions were formed based on the interaction during the interview rather than on questionnaire data. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Simultaneous-Speaker Voice Activity Detection and Localization Using Mid-Fusion of SVM and HMMs

    Page(s): 1032 - 1044
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2156 KB) |  | HTML iconHTML  

    Humans can extract speech signals that they need to understand from a mixture of background noise, interfering sound sources, and reverberation for effective communication. Voice Activity Detection (VAD) and Sound Source Localization (SSL) are the key signal processing components that humans perform by processing sound signals received at both ears, sometimes with the help of visual cues by locating and observing the lip movements of the speaker. Both VAD and SSL serve as the crucial design elements for building applications involving human speech. For example, systems with microphone arrays can benefit from these for robust speech capture in video conferencing applications, or for speaker identification and speech recognition in Human Computer Interfaces (HCIs). The design and implementation of robust VAD and SSL algorithms in practical acoustic environments are still challenging problems, particularly when multiple simultaneous speakers exist in the same audiovisual scene. In this work we propose a multimodal approach that uses Support Vector Machines (SVMs) and Hidden Markov Models (HMMs) for assessing the video and audio modalities through an RGB camera and a microphone array. By analyzing the individual speakers' spatio-temporal activities and mouth movements, we propose a mid-fusion approach to perform both VAD and SSL for multiple active and inactive speakers. We tested the proposed algorithm in scenarios with up to three simultaneous speakers, showing an average VAD accuracy of 95.06% with an average error of 10.9 cm when estimating the three-dimensional locations of the speakers. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Self-Sorting Map: An Efficient Algorithm for Presenting Multimedia Data in Structured Layouts

    Page(s): 1045 - 1058
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (6503 KB) |  | HTML iconHTML  

    This paper presents the Self-Sorting Map (SSM), a novel algorithm for organizing and presenting multimedia data. Given a set of data items and a dissimilarity measure between each pair of them, the SSM places each item into a unique cell of a structured layout, where the most related items are placed together and the unrelated ones are spread apart. The algorithm integrates ideas from dimension reduction, sorting, and data clustering algorithms. Instead of solving the continuous optimization problem that other dimension reduction approaches do, the SSM transforms it into a discrete labeling problem. As a result, it can organize a set of data into a structured layout without overlap, providing a simple and intuitive presentation. The algorithm is designed for sorting all data items in parallel, making it possible to arrange millions of items in seconds. Experiments on different types of data demonstrate the SSM's versatility in a variety of applications, ranging from positioning city names by proximities to presenting images according to visual similarities, to visualizing semantic relatedness between Wikipedia articles. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Cross-Modal Approach for Extracting Semantic Relationships Between Concepts Using Tagged Images

    Page(s): 1059 - 1074
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2805 KB)  

    This paper presents a cross-modal approach for extracting semantic relationships between concepts using tagged images. In the proposed method, we first project both text and visual features of the tagged images to a latent space using canonical correlation analysis (CCA). Then, under the probabilistic interpretation of CCA, we calculate a representative distribution of the latent variables for each concept. Based on the representative distributions of the concepts, we derive two types of measures: the semantic relatedness between the concepts and the abstraction level of each concept. Because these measures are derived from a cross-modal scheme that enables the collaborative use of both text and visual features, the semantic relationships can successfully reflect semantic and visual contexts. Experiments conducted on tagged images collected from Flickr show that our measures are more coherent to human cognition than the conventional measures that use either text or visual features, or the WordNet-based measures. In particular, a new measure of semantic relatedness, which satisfies the triangle inequality, obtains the best results among different distance measures in our framework. The applicability of our measures to multimedia-related tasks such as concept clustering, image annotation and tag recommendation is also shown in the experiments. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Corpus Development for Affective Video Indexing

    Page(s): 1075 - 1089
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1191 KB) |  | HTML iconHTML  

    Affective video indexing is the area of research that develops techniques to automatically generate descriptions of video content that encode the emotional reactions which the video content evokes in viewers. This paper provides a set of corpus development guidelines based on state-of-the-art practice intended to support researchers in this field. Affective descriptions can be used for video search and browsing systems offering users affective perspectives. The paper is motivated by the observation that affective video indexing has yet to fully profit from the standard corpora (data sets) that have benefited conventional forms of video indexing. Affective video indexing faces unique challenges, since viewer-reported affective reactions are difficult to assess. Moreover affect assessment efforts must be carefully designed in order to both cover the types of affective responses that video content evokes in viewers and also capture the stable and consistent aspects of these responses. We first present background information on affect and multimedia and related work on affective multimedia indexing, including existing corpora. Three dimensions emerge as critical for affective video corpora, and form the basis for our proposed guidelines: the context of viewer response, personal variation among viewers, and the effectiveness and efficiency of corpus creation. Finally, we present examples of three recent corpora and discuss how these corpora make progressive steps towards fulfilling the guidelines. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Discrete Cosine Transform Locality-Sensitive Hashes for Face Retrieval

    Page(s): 1090 - 1103
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3023 KB) |  | HTML iconHTML  

    Descriptors such as local binary patterns perform well for face recognition. Searching large databases using such descriptors has been problematic due to the cost of the linear search, and the inadequate performance of existing indexing methods. We present Discrete Cosine Transform (DCT) hashing for creating index structures for face descriptors. Hashes play the role of keywords: an index is created, and queried to find the images most similar to the query image. Common hash suppression is used to improve retrieval efficiency and accuracy. Results are shown on a combination of six publicly available face databases (LFW, FERET, FEI, BioID, Multi-PIE, and RaFD). It is shown that DCT hashing has significantly better retrieval accuracy and it is more efficient compared to other popular state-of-the-art hash algorithms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Contextual Query Expansion for Image Retrieval

    Page(s): 1104 - 1114
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1882 KB) |  | HTML iconHTML  

    In this paper, we study the problem of image retrieval by introducing contextual query expansion to address the shortcomings of bag-of-words based frameworks: semantic gap of visual word quantization, and the efficiency and storage loss due to query expansion. Our method is built on common visual patterns (CVPs), which are the distinctive visual structures between two images and have rich contextual information. With CVPs, two contextual query expansions on visual word-level and image-level are explored, respectively. For visual word-level expansion, we find contextual synonymous visual words (CSVWs) and expand a word in the query image with its CSVWs to boost retrieval accuracy. CSVWs are the words that appear in the same CVPs and have same contextual meaning, i.e. similar spatial layout and geometric transformations. For image-level expansion, the database images that have the same CVPs are organized by linked list and the images that have the same CVPs as the query image, but not included in the results are automatically expanded. The main computation of these two expansions is carried out offline, and they can be integrated into the inverted file and efficiently applied to all images in the dataset. Experiments conducted on three reference datasets and a dataset of one million images demonstrate the effectiveness and efficiency of our method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Image Attribute Adaptation

    Page(s): 1115 - 1126
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2759 KB) |  | HTML iconHTML  

    Visual attributes can be considered as a middle-level semantic cue that bridges the gap between low-level image features and high-level object classes. Thus, attributes have the advantage of transcending specific semantic categories or describing objects across categories. Since attributes are often human-nameable and domain specific, much work constructs attribute annotations ad hoc or take them from an application-dependent ontology. To facilitate other applications with attributes, it is necessary to develop methods which can adapt a well-defined set of attributes to novel images. In this paper, we propose a framework for image attribute adaptation. The goal is to automatically adapt the knowledge of attributes from a well-defined auxiliary image set to a target image set, thus assisting in predicting appropriate attributes for target images. In the proposed framework, we use a non-linear mapping function corresponding to multiple base kernels to map each training images of both the auxiliary and the target sets to a Reproducing Kernel Hilbert Space (RKHS), where we reduce the mismatch of data distributions between auxiliary and target images. In order to make use of un-labeled images, we incorporate a semi-supervised learning process. We also introduce a robust loss function into our framework to remove the shared irrelevance and noise of training images. Experiments on two couples of auxiliary-target image sets demonstrate that the proposed framework has better performance of predicting attributes for target testing images, compared to three baselines and two state-of-the-art domain adaptation methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Prior-Free Weighting Scheme for Binary Code Ranking

    Page(s): 1127 - 1139
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3720 KB) |  | HTML iconHTML  

    Fast similarity search has been a research focus in recent years. Binary hashing, which embeds high-dimensional data points into Hamming space, is a promising way to accelerate similarity search, since its search process can be performed in real-time by using Hamming distance as similarity metric. However, as Hamming distance is discrete and bounded by code length, its resolution is limited. In practice, there are often many results sharing the same Hamming distance to a query, which poses a critical issue for problems where ranking is important. This paper proposes a weighted Hamming distance ranking algorithm (WhRank) to give a better ranking of results with equal Hamming distances to a query. By assigning different bit-level weights to different bits, WhRank is able to distinguish between the relative importance of different bits, and to rank the results at a finer-grained hash code level rather than the original integer Hamming distance level. We show that an effective weight is not only data-adaptive but also query-sensitive, and give a simple yet effective prior-free weight learning algorithm. Evaluations on three large-scale image datasets containing up to one million points demonstrate the efficacy of the proposed algorithm. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Adaptive Mechanism for Optimal Content Download in Wireless Networks

    Page(s): 1140 - 1155
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2547 KB) |  | HTML iconHTML  

    This paper presents an adaptive mechanism for improving the content download in wireless environments. The solution is based on the use of the file delivery over unidirectional transport (FLUTE) protocol in multicast networks, which reduce considerably the bandwidth when there are many users interested in the same contents. Specifically, the system proposed reduces the average download time of clients within the coverage area, thus improving the Quality of Experience. To that extent, clients send periodically feedback messages to the server reporting the losses they are experiencing. With this information, the server decides which is the optimum application layer-forward error correction (AL-FEC) code rate that minimizes the average download time, taking into account the channel bandwidth, and starts sending data with that code rate. The system proposed is evaluated in various scenarios, considering different distributions of losses in the coverage area. Results show that the adaptive solution proposed is very suitable in wireless networks with limited bandwidth. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • PicWords: Render a Picture by Packing Keywords

    Page(s): 1156 - 1164
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1750 KB) |  | HTML iconHTML  

    In this paper, we propose a novel text-art system: input a source picture and some keywords introducing the information about the picture, and the output is the so-called PicWords in the form of the source picture composed of the introduction keywords. Different from traditional text-graphics which are created by highly skilled artists and involve a huge amount of tedious manual work, PicWords is an automatic non-photorealistic rendering (NPR) packing system. Given a source picture, we first generate its silhouette, which is a binary image containing a Yang part and a Yin part. Yang part is for keywords placing while the Yin part can be ignored. Next, the Yang part is further over-segmented into small patches, each of which serves as a container for one keyword. To make sure that more important keywords are put into more salient and larger image patches, we rank both the patches and keywords and construct a correspondence between the patch list and keyword list. Then, mean value coordinates method is used for the keyword-patch warping. Finally, certain post-processing techniques are adopted to improve the aesthetics of PicWords. Extensive experimental results well demonstrate the effectiveness of the proposed PicWords system. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Regularity Preserved Superpixels and Supervoxels

    Page(s): 1165 - 1175
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3016 KB) |  | HTML iconHTML  

    Most existing superpixel algorithms ignore the spatial structure and regularity properties, which result in undesirable sizes and location relationships for the subsequent processing. In this paper, we introduce a new method to generate the regularity preserved superpixels. Starting from the lattice seeds, our method relocates them to the pixel with locally maximal edge magnitudes and treats them as the superpixel junctions. Then, the shortest path algorithm is employed to find the local optimal boundary connecting each adjacent junction pair. Thanks to the local constraints, our method obtains homogeneous superpixels with adjacency in lowly textured and uniform regions and simultaneously preserves the boundary adherence in the high contrast contents. Our method preserves the regularity property without significantly sacrificing the segmentation accuracy. Moreover, we extend this regular constraint for generating the supervoxels. Our method obtains the regular supervoxels, which preserves the structural relation on both spatial and temporal spaces of the video. Quantitative and qualitative experimental results on benchmark datasets demonstrate that our simple but effective method outperforms the existing regular superpixel methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Interruption Probability of Wireless Video Streaming With Limited Video Lengths

    Page(s): 1176 - 1180
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (873 KB) |  | HTML iconHTML  

    In this paper, we consider a simple queueing theoretic method to predict the video interruption probability for a given video length. Specifically, a mobile user is streaming a video with a limited length and variable bit rate video encoding. The playback interruptions are caused by random packet delays occurring in the wireless link between the source and destination nodes. The dynamics of the playback buffer in the user terminal is modeled as a G/G/1 queue. To evaluate the video interruption probability, a simple asymptotic method has been presented for the case in which the video length approaches infinity. However, in many practical cases, the video length is limited, hindering the usage of the asymptotic method. We obtain a simple and closed-form upper bound for the analysis of the interruption probability that incorporates the effect of finite video lengths with known statistical delay parameters. Furthermore, a useful method is presented to select between the proposed method and the asymptotic method whose relative accuracy changes with the video length and statistical properties of the buffer load size. The accuracy of the proposed analytical method is compared with the existing methods. Finally, we address some practical challenges in buffer dimensioning when the statistical delay parameters are unknown and estimated with a finite number of received packets. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

The scope of the Periodical is the various aspects of research in multimedia technology and applications of multimedia.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Chang Wen Chen
State University of New York at Buffalo