By Topic

Multimedia, IEEE Transactions on

Issue 2 • Date Feb. 2014

Filter Results

Displaying Results 1 - 25 of 32
  • Table of contents

    Page(s): C1 - C4
    Save to Project icon | Request Permissions | PDF file iconPDF (158 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Multimedia publication information

    Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (130 KB)  
    Freely Available from IEEE
  • Semi-Supervised Multiple Feature Analysis for Action Recognition

    Page(s): 289 - 298
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2016 KB) |  | HTML iconHTML  

    This paper presents a semi-supervised method for categorizing human actions using multiple visual features. The proposed algorithm simultaneously learns multiple features from a small number of labeled videos, and automatically utilizes data distributions between labeled and unlabeled data to boost the recognition performance. Shared structural analysis is applied in our approach to discover a common subspace shared by each type of feature. In the subspace, the proposed algorithm is able to characterize more discriminative information of each feature type. Additionally, data distribution information of each type of feature has been preserved. The aforementioned attributes make our algorithm robust for action recognition, especially when only limited labeled training samples are provided. Extensive experiments have been conducted on both the choreographed and the realistic video datasets, including KTH, Youtube action and UCF50. Experimental results show that our method outperforms several state-of-the-art algorithms. Most notably, much better performances have been achieved when there are only a few labeled training samples. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Data-Driven Approach for Facial Expression Retargeting in Video

    Page(s): 299 - 310
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2458 KB) |  | HTML iconHTML  

    This paper presents a data-driven approach for facial expression retargeting in video, i.e., synthesizing a face video of a target subject that mimics the expressions of a source subject in the input video. Our approach takes advantage of a pre-existing facial expression database of the target subject to achieve realistic synthesis. First, for each frame of the input video, a new facial expression similarity metric is proposed for querying the expression database of the target person to select multiple candidate images that are most similar to the input. The similarity metric is developed using a metric learning approach to reliably handle appearance difference between different subjects. Secondly, we employ an optimization approach to choose the best candidate image for each frame, resulting in a retrieved sequence that is temporally coherent. Finally, a spatio-temporal expression mapping method is employed to further improve the synthesized sequence. Experimental results show that our system is capable of generating high quality facial expression videos that match well with the input sequences, even when the source and target subjects have big identity difference. In addition, extensive evaluations demonstrate the high accuracy of the learned expression similarity metric and the effectiveness of our retrieval strategy. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptive Watermarking and Tree Structure Based Image Quality Estimation

    Page(s): 311 - 325
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3595 KB) |  | HTML iconHTML  

    Image quality evaluation is very important. In applications involving signal transmission, the Reduced- or No-Reference quality metrics are generally more practical than the Full-Reference metrics. In this study, we propose a quality estimation method based on a novel semi-fragile and adaptive watermarking scheme. The proposed scheme uses the embedded watermark to estimate the degradation of cover image under different distortions. The watermarking process is implemented in DWT domain of the cover image. The correlated DWT coefficients across the DWT subbands are categorized into Set Partitioning in Hierarchical Trees (SPIHT). Those SPHIT trees are further decomposed into a set of bitplanes. The watermark is embedded into the selected bitplanes of the selected DWT coefficients of the selected tree without causing significant fidelity loss to the cover image. The accuracy of the quality estimation is made to approach that of Full-Reference metrics by referring to an "Ideal Mapping Curve" computed a priori. The experimental results show that the proposed scheme can estimate image quality in terms of PSNR, wPSNR, JND and SSIM with high accuracy under JPEG compression, JPEG2000 compression, Gaussian low-pass filtering and Gaussian noise distortion. The results also show that the proposed scheme has good computational efficiency for practical applications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient Viewer-Centric Depth Adjustment Based on Virtual Fronto-Parallel Planar Projection in Stereo 3D Images

    Page(s): 326 - 336
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2364 KB) |  | HTML iconHTML  

    This paper presents an efficient method for adjusting the 3D depth of an object including as much as a whole scene in stereo 3D images by utilizing a virtual fronto-parallel planar projection in the 3D space perceived by a viewer. The proposed method just needs to establish object correspondence instead of the accurate estimation of the disparity field or point correspondence. We simulate the depth adjustment of a 3D point perceived by a viewer through a corresponding pair of points on the stereo 3D images by moving the virtual fronto-parallel plane on which the left and right points are projected. We show that the resulting transformation of image coordinates of the points can be simply expressed by three values of a scale factor and two translations that depend on one parameter for the depth adjustment. The experimental results demonstrate the feasibility of the proposed approach that yields less visual fatigue and smaller 3D shape distortion than the conventional parallax adjustment method. The overall procedure can be efficiently applied to each frame of a stereo video without causing any artifact. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Point Cloud Encoding for 3D Building Model Retrieval

    Page(s): 337 - 345
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1887 KB) |  | HTML iconHTML  

    An increasing number of three-dimensional (3D) building models are being made available on Web-based model-sharing platforms. Motivated by the concept of data reuse, an encoding approach is proposed for 3D building model retrieval using point clouds acquired by airborne light detection and ranging (LiDAR) systems. To encode LiDAR point clouds with sparse, noisy, and incomplete sampling, we introduce a novel encoding scheme based on a set of low-frequency spherical harmonic basis functions. These functions provide compact representation and ease the encoding difficulty coming from inherent noises of point clouds. Additionally, a data filling and resampling technique is proposed to solve the aliasing problem caused by the sparse and incomplete sampling of point clouds. Qualitative and quantitative analyses of LiDAR data show a clear superiority of the proposed method over related methods. A cyber campus generated by retrieving 3D building models with airborne LiDAR point clouds demonstrates the feasibility of the proposed method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Towards Mobile Document Image Retrieval for Digital Library

    Page(s): 346 - 359
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2978 KB) |  | HTML iconHTML  

    With the proliferation of mobile devices, recent years have witnessed an emerging potential to integrate mobile visual search techniques into digital library. Such a mobile application scenario in digital library has posed significant and unique challenges in document image search. The mobile photograph makes it tough to extract discriminative features from the landmark regions of documents, like line drawings, as well as text layouts. In addition, both search scalability and query delivery latency remain challenging issues in mobile document search. The former relies on an effective yet memory-light indexing structure to accomplish fast online search, while the latter puts a bit budget constraint of query images over the wireless link. In this paper, we propose a novel mobile document image retrieval framework, consisting of a robust Local Inner-distance Shape Context (LISC) descriptor of line drawings, a Hamming distance KD-Tree for scalable and memory-light document indexing, as well as a JBIG2 based query compression scheme, together with a Retinex based enhancement and an OTSU based binarization, to reduce the latency of delivering query while maintaining query quality in terms of search performance. We have extensively validated the key techniques in this framework by quantitative comparison to alternative approaches. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Analytical Approach for Voice Capacity Estimation Over WiFi Network Using ITU-T E-Model

    Page(s): 360 - 372
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1938 KB) |  | HTML iconHTML  

    To ensure customer satisfaction and greater market acceptance, voice over Wi-Fi networks must ensure voice quality under various network parameters, configurations and traffic conditions, and other practical effects, e.g., channel noise, and capturing effects. An accurate voice capacity estimation model considering these factors can greatly assist network designers. In the current work, we propose an analytical model to estimate voice over Internet Protocol (VoIP) capacity over Wi-Fi networks addressing these issues. We employ widely used ITU-T E-model to assess voice quality and VoIP call capacity is presented in the form of an optimization problem with voice quality requirement as a constraint. In particular, we analyze delay and loss in channel access and queue, and their impacts on voice quality. The proposed capacity model is first developed for a single hop wireless local area network (WLAN) and then extended for multihop scenarios. To model real network scenario closely, we also consider channel noise and capture effect, and analyze the impacts of transmission range, interference range, and WLAN radius. In absence of any existing call capacity model that considers all the above factors concomitantly, our proposed model will be extremely useful to network designers and voice capacity planners. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Online HodgeRank on Random Graphs for Crowdsourceable QoE Evaluation

    Page(s): 373 - 386
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2488 KB) |  | HTML iconHTML  

    HodgeRank on random graphs is proposed recently as an effective framework for multimedia quality assessment problem based on paired comparison methods. With a random design on graphs, it is particularly suitable for large scale crowdsourcing experiments on the Internet. However, there still lacks a systematic study about online schemes to deal with the rising streaming and massive data in crowdsourceable scenarios. To fill in this gap, we propose in this paper an online ranking/rating scheme based on stochastic approximation of HodgeRank on random graphs for Quality of Experience (QoE) evaluation, where assessors and rating pairs enter the system in a sequential or streaming way. The scheme is shown in both theory and experiments to be efficient in obtaining global ranking by exhibiting the same asymptotic performance as batch HodgeRank under a general edge-independent sampling process. Moreover, the proposed framework enables us to monitor topological changement and triangular inconsistency in real time. Among a wide spectrum of choices, two particular types of random graphs are studied in detail, i.e., Erdös-Rényi random graph and preferential attachment random graph. The former is the simplest I.I.D. (independent and identically distributed) sampling and the latter may achieve more efficient performance in ranking the top- k items due to its Rich-get-Richer property. We demonstrate the effectiveness of the proposed framework on LIVE and IVC databases. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multimodal Interactive Continuous Scoring of Subjective 3D Video Quality of Experience

    Page(s): 387 - 402
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3607 KB) |  | HTML iconHTML  

    People experience a variety of 3D visual programs, such as 3D cinema, 3D TV and 3D games, making it necessary to deploy reliable methodologies for predicting each viewer's subjective experience. We propose a new methodology that we call multimodal interactive continuous scoring of quality (MICSQ). MICSQ is composed of a device interaction process between the 3D display and a separate device (PC, tablet, etc.) used as an assessment tool, and a human interaction process between the subject(s) and the separate device. The scoring process is multimodal, using aural and tactile cues to help engage and focus the subject(s) on their tasks by enhancing neuroplasticity. Recorded human responses to 3D visualizations obtained via MICSQ correlate highly with measurements of spatial and temporal activity in the 3D video content. We have also found that 3D quality of experience (QoE) assessment results obtained using MICSQ are more reliable over a wide dynamic range of content than obtained by the conventional single stimulus continuous quality evaluation (SSCQE) protocol. Moreover, the wireless device interaction process makes it possible for multiple subjects to assess 3D QoE simultaneously in a large space such as a movie theater, at different viewing angles and distances. We conducted a series of interesting 3D experiments showing the accuracy and versatility of the new system, while yielding new findings on visual comfort in terms of disparity, motion and an interesting relation between the naturalness and depth of field (DOF) of a stereo camera. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multi-Label Learning With Fused Multimodal Bi-Relational Graph

    Page(s): 403 - 412
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1922 KB) |  | HTML iconHTML  

    The problem of multi-label image classification using multiple feature modalities is considered in this work. Given a collection of images with partial labels, we first model the association between different feature modalities and the images labels. These associations are then propagated with a graph diffusion kernel to classify the unlabeled images. Towards this objective, a novel Fused Multimodal Bi-relational Graph representation is proposed, with multiple graphs corresponding to different feature modalities, and one graph corresponding to the image labels. Such a representation allows for effective exploitation of both feature complementariness and label correlation. This contrasts with previous work where these two factors are considered in isolation. Furthermore, we provide a solution to learn the weight for each image graph by estimating the discriminative power of the corresponding feature modality. Experimental results with our proposed method on two standard multi-label image datasets are very promising. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Discriminative Structure Learning for Semantic Concept Detection With Graph Embedding

    Page(s): 413 - 426
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2839 KB) |  | HTML iconHTML  

    Semantic concept detection is a very promising way to manage huge amounts of personal contents. In this paper, we propose discriminative structure learning for semantic concept detection with graph embedding. We focus on the task of whole-image categorization and employ graphical model inference based semi-supervised learning (SSL) to detect the semantic category of an image. To effectively extract global features from images, we utilize the spatial pyramid image representation. Then, we perform data warping over the histogram intersection kernel-based graph to learn discriminative features and make image distributions more discriminative for both labeled and unlabeled images. By data warping, each cluster of images is mapped into a relatively compact cluster as well as clusters become well-separated. Moreover, we adopt low-rank representation (LRR) in the embedded space to capture the global discriminative structure from the learned features for label propagation due to its good ability of capturing the global structure of data distributions and robustness against noise and outliers. Finally, we design a smooth nonlinear detector on the captured global discriminative structure to effectively propagate the concepts of labeled images to unlabeled images. Extensive experiments are conducted on four publicly available databases to verify the superiority of the proposed method compared to the state-of-the-art methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Sparse Multi-Modal Hashing

    Page(s): 427 - 439
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2492 KB) |  | HTML iconHTML  

    Learning hash functions across heterogenous high-dimensional features is very desirable for many applications involving multi-modal data objects. In this paper, we propose an approach to obtain the sparse codesets for the data objects across different modalities via joint multi-modal dictionary learning, which we call sparse multi-modal hashing (abbreviated as SM2H). In SM2H, both intra-modality similarity and inter-modality similarity are first modeled by a hypergraph, then multi-modal dictionaries are jointly learned by Hypergraph Laplacian sparse coding. Based on the learned dictionaries, the sparse codeset of each data object is acquired and conducted for multi-modal approximate nearest neighbor retrieval using a sensitive Jaccard metric. The experimental results show that SM2H outperforms other methods in terms of mAP and Percentage on two real-world data sets. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Gaze-Based Relevance Feedback for Realizing Region-Based Image Retrieval

    Page(s): 440 - 454
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2310 KB) |  | HTML iconHTML  

    In this paper, a gaze-based Relevance Feedback (RF) approach to region-based image retrieval is presented. Fundamental idea of the proposed method comprises the iterative estimation of the real-world objects (or their constituent parts) that are of interest to the user and the subsequent exploitation of this information for refining the image retrieval results. Primary novelties of this work are: a) the introduction of a new set of gaze features for realizing user's relevance assessment prediction at region-level, and b) the design of a time-efficient and effective object-based RF framework for image retrieval. Regarding the interpretation of the gaze signal, a novel set of features is introduced by formalizing the problem under a mathematical perspective, contrary to the exclusive use of explicitly defined features that are in principle derived from the psychology domain. Apart from the temporal attributes, the proposed features also represent the spatial characteristics of the gaze signal, which have not been extensively studied in the literature so far. On the other hand, the developed object-based RF mechanism aims at overcoming the main limitation of region-based RF approaches, i.e., the frequently inaccurate estimation of the regions of interest in the retrieved images. Moreover, the incorporation of a single-camera image processing-based gaze tracker makes the overall system cost efficient and portable. As it is shown by the experimental evaluation, the proposed method outperforms representative global- and region-based explicit RF approaches, using a challenging general-purpose image dataset. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Resource Allocation for Personalized Video Summarization

    Page(s): 455 - 469
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3027 KB) |  | HTML iconHTML  

    We propose a hybrid personalized summarization framework that combines adaptive fast-forwarding and content truncation to generate comfortable and compact video summaries. We formulate video summarization as a discrete optimization problem, where the optimal summary is determined by adopting Lagrangian relaxation and convex-hull approximation to solve a resource allocation problem. To trade-off playback speed and perceptual comfort we consider information associated to the still content of the scene, which is essential to evaluate the relevance of a video, and information associated to the scene activity, which is more relevant for visual comfort. We perform clip-level fast-forwarding by selecting the playback speeds from discrete options, which naturally include content truncation as special case with infinite playback speed. We demonstrate the proposed summarization framework in two use cases, namely summarization of broadcasted soccer videos and surveillance videos. Objective and subjective experiments are performed to demonstrate the relevance and efficiency of the proposed method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Representative Discovery of Structure Cues for Weakly-Supervised Image Segmentation

    Page(s): 470 - 479
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2064 KB) |  | HTML iconHTML  

    Weakly-supervised image segmentation is a challenging problem with multidisciplinary applications in multimedia content analysis and beyond. It aims to segment an image by leveraging its image-level semantics (i.e., tags). This paper presents a weakly-supervised image segmentation algorithm that learns the distribution of spatially structural superpixel sets from image-level labels. More specifically, we first extract graphlets from a given image, which are small-sized graphs consisting of superpixels and encapsulating their spatial structure. Then, an efficient manifold embedding algorithm is proposed to transfer labels from training images into graphlets. It is further observed that there are numerous redundant graphlets that are not discriminative to semantic categories, which are abandoned by a graphlet selection scheme as they make no contribution to the subsequent segmentation. Thereafter, we use a Gaussian mixture model (GMM) to learn the distribution of the selected post-embedding graphlets (i.e., vectors output from the graphlet embedding). Finally, we propose an image segmentation algorithm, termed representative graphlet cut, which leverages the learned GMM prior to measure the structure homogeneity of a test image. Experimental results show that the proposed approach outperforms state-of-the-art weakly-supervised image segmentation methods, on five popular segmentation data sets. Besides, our approach performs competitively to the fully-supervised segmentation models. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On the Quality of Service of Cloud Gaming Systems

    Page(s): 480 - 495
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3210 KB) |  | HTML iconHTML  

    Cloud gaming, i.e., real-time game playing via thin clients, relieves users from being forced to upgrade their computers and resolve the incompatibility issues between games and computers. As a result, cloud gaming is generating a great deal of interests among entrepreneurs, venture capitalists, general publics, and researchers. However, given the large design space, it is not yet known which cloud gaming system delivers the best user-perceived Quality of Service (QoS) and what design elements constitute a good cloud gaming system. This study is motivated by the question: How good is the QoS of current cloud gaming systems? Answering the question is challenging because most cloud gaming systems are proprietary and closed, and thus their internal mechanisms are not accessible for the research community. In this paper, we propose a suite of measurement techniques to evaluate the QoS of cloud gaming systems and prove the effectiveness of our schemes using a case study comprising two well-known cloud gaming systems: OnLive and StreamMyGame. Our results show that OnLive performs better, because it provides adaptable frame rates, better graphic quality, and shorter server processing delays, while consuming less network bandwidth. Our measurement techniques are general and can be applied to any cloud gaming systems, so that researchers, users, and service providers may systematically quantify the QoS of these systems. To the best of our knowledge, the proposed suite of measurement techniques have never been presented in the literature. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Correlation-Aware Packet Scheduling in Multi-Camera Networks

    Page(s): 496 - 509
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1955 KB) |  | HTML iconHTML  

    In multiview applications, multiple cameras acquire the same scene from different viewpoints and generally produce correlated video streams. This results in large amounts of highly redundant data. In order to save resources, it is critical to handle properly this correlation during encoding and transmission of the multiview data. In this work, we propose a correlation-aware packet scheduling algorithm for multi-camera networks, where information from all cameras are transmitted over a bottleneck channel to clients that reconstruct the multiview images. The scheduling algorithm relies on a new rate-distortion model that captures the importance of each view in the scene reconstruction. We propose a problem formulation for the optimization of the packet scheduling policies, which adapt to variations in the scene content. Then, we design a low complexity scheduling algorithm based on a trellis search that selects the subset of candidate packets to be transmitted towards effective multiview reconstruction at clients. Extensive simulation results confirm the gain of our scheduling algorithm when inter-source correlation information is used in the scheduler, compared to scheduling policies with no information about the correlation or non-adaptive scheduling policies. We finally show that increasing the optimization horizon in the packet scheduling algorithm improves the transmission performance, especially in scenarios where the level of correlation rapidly varies with time. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Investigating Redundant Internet Video Streaming Traffic on iOS Devices: Causes and Solutions

    Page(s): 510 - 520
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1389 KB) |  | HTML iconHTML  

    The Internet has witnessed rapidly increasing streaming traffic to various mobile devices. In this paper, through analysis of a server-side workload and experiments in a controlled lab environment, we find that current practice has introduced a significant amount of redundant traffic. In particular, for the popular iOS based mobile devices, accessing popular Internet streaming services typically involves about 10%-70% redundant traffic. Such a practice not only over-utilizes and wastes resources on the server side and the network (cellular or Internet), but also consumes additional battery power on user's mobile devices and leads to possible monetary cost. To alleviate such a situation without changing the server side or the client side, we design and implement CStreamer that can transparently work between existing mobile clients and servers. We have implemented a prototype and installed on Amazon EC2. Experiments conducted based on this prototype show that CStreamer can completely eliminate the redundant traffic without degrading user's QoS. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Band Codes for Energy-Efficient Network Coding With Application to P2P Mobile Streaming

    Page(s): 521 - 532
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1865 KB) |  | HTML iconHTML  

    A key problem in network coding (NC) lies in the complexity and energy consumption associated with the packet decoding processes, which hinder its application in mobile environments. Controlling and hence limiting such factors has always been an important but elusive research goal, since the packet degree distribution, which is the main factor driving the complexity, is altered in a non-deterministic way by the random recombinations at the network nodes. In this paper we tackle this problem with a new approach and propose Band Codes (BC), a novel class of network codes specifically designed to preserve the packet degree distribution during packet encoding, recombination and decoding. BC are random codes over GF(2) that exhibit low decoding complexity, feature limited and controlled degree distribution by construction, and hence allow to effectively apply NC even in energy-constrained scenarios. In particular, in this paper we motivate and describe our new design and provide a thorough analysis of its performance. We provide numerical simulations of the BC performance in order to validate the analysis and assess the overhead of BC with respect to a conventional random NC scheme. Moreover, experiment in a real-world application, namely peer-to-peer mobile media streaming using a random-push protocol, show that BC reduce the decoding complexity by a factor of two with negligible increase of the encoding overhead, paving the way for the application of NC to power-constrained devices. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multipath Video Real-Time Streaming by Field-Based Anycast Routing

    Page(s): 533 - 540
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1303 KB) |  | HTML iconHTML  

    Wireless mesh networking (WMN) for video surveillance provides a strong potential for rapid deployment in a large community, for which reliability and survivability for real-time streaming are the key performance measure. However, current routing protocols do not provide a robust solution to meet video transmission requirements such as providing load balancing for the high data rate and delay requirements. We propose an application of field-based anycast routing (FAR) protocol, which utilizes rapid routing dynamics inspired by an electrostatic potential field model governed by Poisson's equation. This routing metric takes into account geometric proximity and congestion degree in order to increase delivery ratio and decrease end-to-end delay, which determine the quality of the delivered video. In addition, FAR protects node failure with an on-the-fly rerouting process that guarantees the continuity of video streaming. Simulation results show 100% delivery ratio in congestion situations and it shows tolerance to different delay requirements compared with AODV and FDMR protocols in terms of delivering real-time and non real-time video surveillance which verifies FAR as a strong candidate for video transmission over WMN. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Best Practices for QoE Crowdtesting: QoE Assessment With Crowdsourcing

    Page(s): 541 - 558
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1426 KB) |  | HTML iconHTML  

    Quality of Experience (QoE) in multimedia applications is closely linked to the end users' perception and therefore its assessment requires subjective user studies in order to evaluate the degree of delight or annoyance as experienced by the users. QoE crowdtesting refers to QoE assessment using crowdsourcing, where anonymous test subjects conduct subjective tests remotely in their preferred environment. The advantages of QoE crowdtesting lie not only in the reduced time and costs for the tests, but also in a large and diverse panel of international, geographically distributed users in realistic user settings. However, conceptual and technical challenges emerge due to the remote test settings. Key issues arising from QoE crowdtesting include the reliability of user ratings, the influence of incentives, payment schemes and the unknown environmental context of the tests on the results. In order to counter these issues, strategies and methods need to be developed, included in the test design, and also implemented in the actual test campaign, while statistical methods are required to identify reliable user ratings and to ensure high data quality. This contribution therefore provides a collection of best practices addressing these issues based on our experience gained in a large set of conducted QoE crowdtesting studies. The focus of this article is in particular on the issue of reliability and we use video quality assessment as an example for the proposed best practices, showing that our recommended two-stage QoE crowdtesting design leads to more reliable results. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Fast HEVC Inter CU Selection Method Based on Pyramid Motion Divergence

    Page(s): 559 - 564
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (885 KB) |  | HTML iconHTML  

    The newly developed HEVC video coding standard can achieve higher compression performance than the previous video coding standards, such as MPEG-4, H.263 and H.264/AVC. However, HEVC's high computational complexity raises concerns about the computational burden on real-time application. In this paper, a fast pyramid motion divergence (PMD) based CU selection algorithm is presented for HEVC inter prediction. The PMD features are calculated with estimated optical flow of the downsampled frames. Theoretical analysis shows that PMD can be used to help selecting CU size. A k nearest neighboring like method is used to determine the CU splittings. Experimental results show that the fast inter prediction method speeds up the inter coding significantly with negligible loss of the peak signal-to-noise ratio. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On Designing Paired Comparison Experiments for Subjective Multimedia Quality Assessment

    Page(s): 564 - 571
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1020 KB) |  | HTML iconHTML  

    This paper investigates the issue of designing paired comparison-based subjective quality assessment experiments for reliable results. In particular, the convergence behavior of the quality scores estimated from paired comparison results is considered. Via an extensive computer simulation experiment, the estimation performance in terms of the root mean squared error, the rank order correlation coefficient, and the change of the estimated scores with respect to the number of subjects are mathematically modeled. Furthermore, it is confirmed that the models coincide with the theoretical convergence behavior. Issues such as the effect of human errors and the underlying distribution of the true quality scores are also examined. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

The scope of the Periodical is the various aspects of research in multimedia technology and applications of multimedia.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Chang Wen Chen
State University of New York at Buffalo