Scheduled System Maintenance:
On Monday, April 27th, IEEE Xplore will undergo scheduled maintenance from 1:00 PM - 3:00 PM ET (17:00 - 19:00 UTC). No interruption in service is anticipated.
By Topic

Multimedia, IEEE Transactions on

Issue 1 • Date Feb. 2011

Filter Results

Displaying Results 1 - 22 of 22
  • Table of contents

    Publication Year: 2011 , Page(s): C1 - C4
    Save to Project icon | Request Permissions | PDF file iconPDF (46 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Multimedia publication information

    Publication Year: 2011 , Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (36 KB)  
    Freely Available from IEEE
  • Editorial

    Publication Year: 2011 , Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (203 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • Effective Pseudonoise Sequence and Decoding Function for Imperceptibility and Robustness Enhancement in Time-Spread Echo-Based Audio Watermarking

    Publication Year: 2011 , Page(s): 2 - 13
    Cited by:  Papers (14)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (684 KB) |  | HTML iconHTML  

    This paper proposes an effective pseudonoise (PN) sequence and the corresponding decoding function for time-spread echo-based audio watermarking. Different from the traditional PN sequence used in time-spread echo hiding, the proposed PN sequence has two features. Firstly, the echo kernel resulting from the new PN sequence has frequency characteristics with smaller magnitudes in perceptually significant region. This leads to higher perceptual quality. Secondly, the correlation function of the new PN sequence has three times more large peaks than that of the existing PN sequence. Based on this feature, we propose a new decoding function to improve the robustness of time-spread echo-based audio watermarking. The effectiveness of the proposed PN sequence and decoding function is illustrated by theoretical analysis, simulation examples, and listening test. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Collaborative Face Recognition for Improved Face Annotation in Personal Photo Collections Shared on Online Social Networks

    Publication Year: 2011 , Page(s): 14 - 28
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1731 KB) |  | HTML iconHTML  

    Using face annotation for effective management of personal photos in online social networks (OSNs) is currently of considerable practical interest. In this paper, we propose a novel collaborative face recognition (FR) framework, improving the accuracy of face annotation by effectively making use of multiple FR engines available in an OSN. Our collaborative FR framework consists of two major parts: selection of FR engines and merging (or fusion) of multiple FR results. The selection of FR engines aims at determining a set of personalized FR engines that are suitable for recognizing query face images belonging to a particular member of the OSN. For this purpose, we exploit both social network context in an OSN and social context in personal photo collections. In addition, to take advantage of the availability of multiple FR results retrieved from the selected FR engines, we devise two effective solutions for merging FR results, adopting traditional techniques for combining multiple classifier results. Experiments were conducted using 547 991 personal photos collected from an existing OSN. Our results demonstrate that the proposed collaborative FR method is able to significantly improve the accuracy of face annotation, compared to conventional FR approaches that only make use of a single FR engine. Further, we demonstrate that our collaborative FR framework has a low computational cost and comes with a design that is suited for deployment in a decentralized OSN. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Text-Video Completion Using Structure Repair and Texture Propagation

    Publication Year: 2011 , Page(s): 29 - 39
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1452 KB) |  | HTML iconHTML  

    Today, more superimposed text is embedded within videos. Usually some text is unnecessary. Thus, one requires an approach to remove the text and complete the video. However, few conventional approaches complete the video well due to the large-sized text, structure regions, and various types of videos. In response, this study designed a text-video completion algorithm that poses text-video completion as structure repair and texture propagation. To repair the structure regions, the structure interpolation uses the new model's rotated block matching to estimate the initial location of completed regions and later refine the coordinates of completed regions. The information in the neighboring frames then fills the structure regions. To complete the structure regions without tedious manual interaction, the structure extension utilizes the spline curve estimation. Afterwards, derivative propagation realizes the texture region completion. The experiment results are based on several real TV programs, where all of the text regions were completed with spatio-temporal consistency. Additionally, comparisons present that the performance of the proposed algorithm is superior to those of conventional approaches. Its advantages include the reduction of design complexity by only integrating the structure information in multi-frame and the demonstration of structure consistency for realistic videos. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Fuzzy Clustering Algorithm for Virtual Character Animation Representation

    Publication Year: 2011 , Page(s): 40 - 49
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1877 KB) |  | HTML iconHTML  

    The use of realistic humanoid animations generated through motion capture (MoCap) technology is widespread across various 3-D applications and industries. However, the existing compression techniques for such representation often do not consider the implicit coherence within the anatomical structure of a human skeletal model and lacks portability for transmission consideration. In this paper, a novel concept virtual character animation image (VCAI) is proposed. Built upon a fuzzy clustering algorithm, the data similarity within the anatomy structure of a virtual character (VC) model is jointly considered with the temporal coherence within the motion data to achieve efficient data compression. Since the VCA is mapped as an image, the use of image processing tool is possible for efficient compression and delivery of such content across dynamic network. A modified motion filter (MMF) is proposed to minimize the visual discontinuity in VCA's motion due to the quantization and transmission error. The MMF helps to remove high frequency noise components and smoothen the motion signal providing perceptually improved VCA with lessened distortion. Simulation results show that the proposed algorithm is competitive in compression efficiency and decoded VCA quality against the state-of-the-art VCA compression methods, making it suitable for providing quality VCA animation to low-powered mobile devices. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Training Surrogate Sensors in Musical Gesture Acquisition Systems

    Publication Year: 2011 , Page(s): 50 - 59
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1195 KB) |  | HTML iconHTML  

    Capturing the gestures of music performers is a common task in interactive electroacoustic music. The captured gestures can be mapped to sounds, synthesis algorithms, visuals, etc., or used for music transcription. Two of the most common approaches for acquiring musical gestures are: 1) “hyper-instruments” which are “traditional” musical instruments enhanced with sensors for directly detecting the gestures and 2) “indirect acquisition” in which the only sensor is a microphone capturing the audio signal. Hyper-instruments require invasive modification of existing instruments which is frequently undesirable. However, they provide relatively straightforward and reliable sensor measurements. On the other hand, indirect acquisition approaches typically require sophisticated signal processing and possibly machine learning algorithms in order to extract the relevant information from the audio signal. The idea of using direct sensor(s) to train a machine learning model for indirect acquisition is proposed in this paper. The resulting trained “surrogate” sensor can then be used in place of the original direct invasive sensor(s) that were used for training. That way, the instrument can be used unmodified in performance while still providing the gesture information that a hyper-instrument would provide. In addition, using this approach, large amounts of training data can be collected with minimum effort. Experimental results supporting this idea are provided in two detection contexts: 1) strike position on a drum surface and 2) strum direction on a sitar. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Empowering Visual Categorization With the GPU

    Publication Year: 2011 , Page(s): 60 - 70
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (729 KB) |  | HTML iconHTML  

    Visual categorization is important to manage large collections of digital images and video, where textual metadata is often incomplete or simply unavailable. The bag-of-words model has become the most powerful method for visual categorization of images and video. Despite its high accuracy, a severe drawback of this model is its high computational cost. As the trend to increase computational power in newer CPU and GPU architectures is to increase their level of parallelism, exploiting this parallelism becomes an important direction to handle the computational cost of the bag-of-words approach. When optimizing a system based on the bag-of-words approach, the goal is to minimize the time it takes to process batches of images. this paper, we analyze the bag-of-words model for visual categorization in terms of computational cost and identify two major bottlenecks: the quantization step and the classification step. We address these two bottlenecks by proposing two efficient algorithms for quantization and classification by exploiting the GPU hardware and the CUDA parallel programming model. The algorithms are designed to (1) keep categorization accuracy intact, (2) decompose the problem, and (3) give the same numerical results. In the experiments on large scale datasets, it is shown that, by using a parallel implementation on the Geforce GTX260 GPU, classifying unseen images is 4.8 times faster than a quad-core CPU version on the Core i7 920, while giving the exact same numerical results. In addition, we show how the algorithms can be generalized to other applications, such as text retrieval and video retrieval. Moreover, when the obtained speedup is used to process extra video frames in a video retrieval benchmark, the accuracy of visual categorization is improved by 29%. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploring Distributional Discrepancy for Multidimensional Point Set Retrieval

    Publication Year: 2011 , Page(s): 71 - 81
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (862 KB) |  | HTML iconHTML  

    How to effectively and efficiently assess similarity is a long-standing and challenging research problem in various multimedia applications. For ranked retrieval in a collection of objects based on series of multivariate observations (e.g., searching similar video clips to a query example), satisfactory performance cannot be achieved by using many conventional similarity measures that aggregate element-to-element comparison results. Some correlation information among the individual elements has also been investigated to characterize each set of multidimensional points for comparison, but with an unwarranted assumption that the underlying data distribution has a particular parametric form. Motivated by these concerns, measuring the similarity of multidimensional point sets is approached from a novel collective perspective in this paper, by evaluating the probability that they are consistent with a same distribution. We propose to make use of nonparametric hypothesis tests in statistics to compute the distributional discrepancy of samples for assessing the degree of similarity between two ensembles of points. While our proposal is mainly presented in the context of video similarity search, it enjoys great flexibility and is extensible to other applications where multidimensional point set representations are involved, such as motion capture retrieval. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Semi-Automatic Tagging of Photo Albums via Exemplar Selection and Tag Inference

    Publication Year: 2011 , Page(s): 82 - 91
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1042 KB) |  | HTML iconHTML  

    As one of the emerging Web 2.0 activities, tagging becomes a popular approach to manage personal media data, such as photo albums. A dilemma in tagging behavior is the users' manual efforts and the tagging accuracy: exhaustively tagging all photos in an album is labor-intensive and time-consuming, and simply entering tags for the whole album leads to unsatisfying results. In this paper, we propose a semi-automatic tagging scheme that aims to facilitate users in photo album tagging. The scheme is able to achieve a good trade-off between manual efforts and tagging accuracy as well as to adjust tagging performance according to the user's customization. For a given album, it first selects a set of representative exemplars for manual tagging via a temporally consistent affinity propagation algorithm, and the tags of the rest of the photos are automatically inferred. Then a constrained affinity propagation algorithm is applied to select a new set of exemplars for manual tagging in an incremental manner, based on which the performance of the tag inference in the previous round can be estimated. If the results are not satisfying enough, a further round of exemplar selection and tag inference will be implemented. This process repeats until satisfactory tagging results are achieved, and users can also stop the process at any time. Experimental results on real-world Flickr photo albums have demonstrated the effectiveness and usefulness of the proposed scheme. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Unequal Error Protection Using Fountain Codes With Applications to Video Communication

    Publication Year: 2011 , Page(s): 92 - 101
    Cited by:  Papers (16)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1552 KB) |  | HTML iconHTML  

    Application-layer forward error correction (FEC) is used in many multimedia communication systems to address the problem of packet loss in lossy packet networks. One powerful form of application-layer FEC is unequal error protection which protects the information symbols according to their importance. We propose a method for unequal error protection with a Fountain code. When the information symbols were partitioned into two protection classes (most important and least important), our method required a smaller transmission bit budget to achieve low bit error rates compared to the two state-of-the-art techniques. We also compared our method to the two state-of-the-art techniques for video unicast and multicast over a lossy network. Simulations for the scalable video coding (SVC) extension of the H.264/AVC standard showed that our method required a smaller transmission bit budget to achieve high-quality video. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Energy-Efficient Multicasting of Scalable Video Streams Over WiMAX Networks

    Publication Year: 2011 , Page(s): 102 - 115
    Cited by:  Papers (15)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (645 KB) |  | HTML iconHTML  

    The Multicast/Broadcast Service (MBS) feature of mobile WiMAX network is a promising technology for providing wireless multimedia, because it allows the delivery of multimedia content to large-scale user communities in a cost-efficient manner. In this paper, we consider WiMAX networks that transmit multiple video streams encoded in scalable manner to mobile receivers using the MBS feature. We focus on two research problems in such networks: 1) maximizing the video quality and 2) minimizing energy consumption for mobile receivers. We formulate and solve the substream selection problem to maximize the video quality, which arises when multiple scalable video streams are broadcast to mobile receivers with limited resources. We show that this problem is NP-Complete, and design a polynomial time approximation algorithm to solve it. We prove that the solutions computed by our algorithm are always within a small constant factor from the optimal solutions. In addition, we extend our algorithm to reduce the energy consumption of mobile receivers. This is done by transmitting the selected substreams in bursts, which allows mobile receivers to turn off their wireless interfaces to save energy. We show how our algorithm constructs burst transmission schedules that reduce energy consumption without sacrificing the video quality. Using extensive simulation and mathematical analysis, we show that the proposed algorithm: 1) is efficient in terms of execution time, 2) achieves high radio resource utilization, 3) maximizes the received video quality, and 4) minimizes the energy consumption for mobile receivers. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptive Resource Allocation for Layer-Encoded IPTV Multicasting in IEEE 802.16 WiMAX Wireless Networks

    Publication Year: 2011 , Page(s): 116 - 124
    Cited by:  Papers (11)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (497 KB) |  | HTML iconHTML  

    In this paper, we study the utility-based resource allocation problem for layer-encoded IPTV multicast service over WiMAX networks. In this problem, each video stream is encoded into multiple layers. We regard each layer as a multicast subsession. Each layer of a video stream is assigned a utility value, and the number of layers for each program each user can receive is adjustable. The objective is to maximize the total utility (i.e., all users' satisfaction) and the system resource utilization, subject to users' channel conditions, the popularity of a video program, and the total available radio resource. We design a polynomial-time solution to this problem, and show that the difference in the performance of our proposed mechanism and the optimal solution is tightly bounded. Our mechanism supports both unicast and multicast, and both single layer and multi-layer environments. Most importantly, it can be integrated with the multicast mechanism defined in WiMAX standards, and can also be applied to any kind of wireless networks which support adaptive modulation and coding schemes. The performance of our scheme is evaluated by simulation. The simulation results show that this scheme can allocate resource flexibly according to the utility function of each program, the popularity of each program, and the amount of total resource available in the network. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Spread Spectrum Visual Sensor Network Resource Management Using an End-to-End Cross-Layer Design

    Publication Year: 2011 , Page(s): 125 - 131
    Cited by:  Papers (10)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (637 KB) |  | HTML iconHTML  

    In this paper, we propose an approach to manage network resources for a direct sequence code division multiple access (DS-CDMA) visual sensor network where nodes monitor scenes with varying levels of motion. It uses cross-layer optimization across the physical layer, the link layer, and the application layer. Our technique simultaneously assigns a source coding rate, a channel coding rate, and a power level to all nodes in the network based on one of two criteria that maximize the quality of video of the entire network as a whole, subject to a constraint on the total chip rate. One criterion results in the minimal average end-to-end distortion amongst all nodes, while the other criterion minimizes the maximum distortion of the network. Our experimental results demonstrate the effectiveness of the cross-layer optimization. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Routing-Aware Multiple Description Video Coding Over Mobile Ad-Hoc Networks

    Publication Year: 2011 , Page(s): 132 - 142
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (947 KB) |  | HTML iconHTML  

    Supporting video transmission over error-prone mobile ad-hoc networks is becoming increasingly important as these networks become more widely deployed. We propose a routing-aware multiple description video coding approach to support video transmission over mobile ad-hoc networks with multiple path transport. We build a statistical model to estimate the packet loss probability of each packet transmitted over the network based on the standard ad-hoc routing messages and network parameters. We then estimate the frame loss probability and dynamically select reference frames in order to alleviate error propagation caused by the packet losses. We conduct experiments using the QualNet simulator that accounts for node mobility, channel properties, MAC operation, multipath routing, and traffic type. The results demonstrate that our proposed method provides 0.7-2.3 dB gains in PSNR for different video sequences under different network settings and guarantees better video quality for a selectably high number of users of the network. Furthermore, we examine the estimation accuracy of our proposed estimation model and show that our model works effectively under various network settings. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Application-Aware Design to Enhance System Efficiency for VoIP Services in BWA Networks

    Publication Year: 2011 , Page(s): 143 - 154
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (860 KB) |  | HTML iconHTML  

    This paper has designed a cross-layer framework for voice over Internet protocol (VoIP) services in IEEE 802.16 systems. It uses the application session information of the session description protocol to generate the quality of service parameters in IEEE 802.16 systems. This feature allows the system to efficiently allocate the radio resource because it can exactly estimate the properties of VoIP services such as packet-size and packet-generation-interval. In other words, the cross-layer framework is expected to achieve a novel resource request scheme for a VoIP service that dynamically assigns the radio resource. This paper has analyzed the maximum number of supportable VoIP users for the resource request schemes in terms of the packet-generation-interval in the silent-period, the duration of the silent-period, and the major VoIP speech codec. The numerical results show that the proposed scheme can efficiently support the VoIP services for the various communication environments. Particularly, it can improve the maximum number of supportable VoIP users by 14 ~ 93% compared to an extended real-time polling service. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Effective Method for Movable Projector Keystone Correction

    Publication Year: 2011 , Page(s): 155 - 160
    Cited by:  Papers (2)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (515 KB)  

    Keystone correction is an essential operation for projector-based applications, especially in mobile scenarios. In this paper, we propose a handheld movable projection method that can freely project keystone-free content on a general flat surface without adding any markings or boundary on it. Such a projection system can give the user greater freedom of display control (such as viewing angle, distance, etc.), without suffering from keystone distortion. To achieve this, we attach a camera to the projector to form a camera-projector pair. A green frame with the same resolution as the projector screen is projected onto the screen. Particle filter is employed to track the green frame and the correction of the display content is then achieved by rectifying the projection region of interest into a rectangular area. We built a prototype system to validate the effectiveness of the method. Experimental results show that our method can continuously project distortion free content in real time with good performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Transactions on Multimedia EDICS

    Publication Year: 2011 , Page(s): 161
    Save to Project icon | Request Permissions | PDF file iconPDF (16 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Multimedia Information for authors

    Publication Year: 2011 , Page(s): 162 - 163
    Save to Project icon | Request Permissions | PDF file iconPDF (46 KB)  
    Freely Available from IEEE
  • Information Forensics and Security - WIFS'11

    Publication Year: 2011 , Page(s): 164
    Save to Project icon | Request Permissions | PDF file iconPDF (1198 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Multimedia society information

    Publication Year: 2011 , Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (28 KB)  
    Freely Available from IEEE

Aims & Scope

The scope of the Periodical is the various aspects of research in multimedia technology and applications of multimedia.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Chang Wen Chen
State University of New York at Buffalo