By Topic

Multimedia, IEEE Transactions on

Issue 8 • Date Dec. 2010

Filter Results

Displaying Results 1 - 18 of 18
  • Table of contents

    Page(s): C1 - C4
    Save to Project icon | Request Permissions | PDF file iconPDF (45 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Multimedia publication information

    Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (36 KB)  
    Freely Available from IEEE
  • Scalable Intraband and Composite Wavelet-Based Coding of Semiregular Meshes

    Page(s): 773 - 789
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2029 KB) |  | HTML iconHTML  

    This paper proposes novel scalable mesh coding designs exploiting the intraband or composite statistical dependencies between the wavelet coefficients. A Laplacian mixture model is proposed to approximate the distribution of the wavelet coefficients. This model proves to be more accurate when compared to commonly employed single Laplacian or generalized Gaussian distribution models. Using the mixture model, we determine theoretically the optimal embedded quantizers to be used in scalable wavelet-based coding of semiregular meshes. In this sense, it is shown that the commonly employed successive approximation quantization is an acceptable, but in general, not an optimal solution. Novel scalable intraband and composite mesh coding systems are proposed, following an information-theoretic analysis of the statistical dependencies between the coefficients. The wavelet subbands are independently encoded using octree-based coding techniques. Furthermore, context-based entropy coding employing either intraband or composite models is applied. The proposed codecs provide both resolution and quality scalability. This lies in contrast to the state-of-the-art interband zerotree-based semiregular mesh coding technique, which supports only quality scalability. Additionally, the experimental results show that, on average, the proposed codecs outperform the interband state-of-the-art for both normal and nonnormal meshes. Finally, compared with a zerotree coding system, the proposed coding schemes are better suited for software/hardware parallelism, due to the independent processing of wavelet subbands. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mining Group Nonverbal Conversational Patterns Using Probabilistic Topic Models

    Page(s): 790 - 802
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1575 KB) |  | HTML iconHTML  

    The automatic discovery of group conversational behavior is a relevant problem in social computing. In this paper, we present an approach to address this problem by defining a novel group descriptor called bag of group-nonverbal-patterns (NVPs) defined on brief observations of group interaction, and by using principled probabilistic topic models to discover topics. The proposed bag of group NVPs allows fusion of individual cues and facilitates the eventual comparison of groups of varying sizes. The use of topic models helps to cluster group interactions and to quantify how different they are from each other in a formal probabilistic sense. Results of behavioral topics discovered on the Augmented Multi-Party Interaction (AMI) meeting corpus are shown to be meaningful using human annotation with multiple observers. Our method facilitates “group behavior-based” retrieval of group conversational segments without the need of any previous labeling. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Visualizing Image Collections Using High-Entropy Layout Distributions

    Page(s): 803 - 813
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2269 KB) |  | HTML iconHTML  

    Mechanisms for visualizing image collections are essential for browsing and exploring their content. This is especially true when metadata are ineffective in retrieving items due to the sparsity or esoteric nature of text. An obvious approach is to automatically lay out sets of images in ways that reflect relationships between the items. However, dimensionality reduction methods that map from high-dimensional content-based feature distributions to low-dimensional layout spaces for visualization often result in displays in which many items are occluded whilst large regions are empty or only sparsely populated. Furthermore, such methods do not consider the shape of the region of layout space to be populated. This paper proposes a method, high-entropy layout distributions. that addresses these limitations. Layout distributions with low differential entropy are penalized. An optimization strategy is presented that finds layouts that have high differential entropy and that reflect inter-image similarities. Efficient optimization is obtained using a step-size constraint and an approximation to quadratic (Renyi) entropy. Two image archives of cultural and commercial importance are used to illustrate and evaluate the method. A comparison with related methods demonstrates its effectiveness. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Sequence Multi-Labeling: A Unified Video Annotation Scheme With Spatial and Temporal Context

    Page(s): 814 - 828
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1710 KB) |  | HTML iconHTML  

    Automatic video annotation is a challenging yet important problem for content-based video indexing and retrieval. In most existing works, annotation is formulated as a multi-labeling problem over individual shots. However, video is by nature informative in spatial and temporal context of semantic concepts. In this paper, we formulate video annotation as a sequence multi-labeling (SML) problem over a shot sequence. Different from many video annotation paradigms working on individual shots, SML aims to predict a multi-label sequence for consecutive shots in a global optimization manner by incorporating spatial and temporal context into a unified learning framework. A novel discriminative method, called sequence multi-label support vector machine (SVMSML), is accordingly proposed to infer the multi-label sequence for a given shot sequence. In SVMSML, a joint kernel is employed to model the feature-level and concept-level context relationships (i.e., the dependencies of concepts on the low-level features, spatial and temporal correlations of concepts). A multiple-kernel learning (MKL) algorithm is developed to optimize the kernel weights of the joint kernel as well as the SML score function. To efficiently search the desirable multi-label sequence over the large output space in both training and test phases, we adopt an approximate method to maximize the energy of a binary Markov random field (BMRF). Extensive experiments on TRECVID'05 and TRECVID'07 datasets have shown that our proposed SVMSML gains superior performance over the state-of-the-art. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Towards a Relevant and Diverse Search of Social Images

    Page(s): 829 - 842
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1990 KB) |  | HTML iconHTML  

    Recent years have witnessed the great success of social media websites. Tag-based image search is an important approach to accessing the image content on these websites. However, the existing ranking methods for tag-based image search frequently return results that are irrelevant or not diverse. This paper proposes a diverse relevance ranking scheme that is able to take relevance and diversity into account by exploring the content of images and their associated tags. First, it estimates the relevance scores of images with respect to the query term based on both the visual information of images and the semantic information of associated tags. Then, we estimate the semantic similarities of social images based on their tags. Based on the relevance scores and the similarities, the ranking list is generated by a greedy ordering algorithm which optimizes average diverse precision, a novel measure that is extended from the conventional average precision. Comprehensive experiments and user studies demonstrate the effectiveness of the approach. We also apply the scheme for web image search reranking, and it is shown that the diversity of search results can be enhanced while maintaining a comparable level of relevance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Information-Theoretic Analysis of Input Strokes in Visual Object Cutout

    Page(s): 843 - 852
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1186 KB) |  | HTML iconHTML  

    Semantic object cutout serves as a basic unit in various image editing systems. In a typical scenario, users are required to provide several strokes which indicate part of the pixels as image background or objects. However, most existing approaches are passive in the sense of accepting input strokes without checking the consistence with user's intention. Here we argue that an active strategy may potentially reduce the interaction burden. Before any real calculation for segmentation, the program can roughly estimate the uncertainty for each image element and actively provide useful suggestions to users. Such a pre-processing is particularly useful for beginners unaware of feeding the underlying cutout algorithms with optimal strokes. We develop such an active object cutout algorithm, named ActiveCut, which makes it possible to automatically detect ambiguity given current user-supplied strokes, and synthesize "suggestive strokes" as feedbacks. Generally, suggestive strokes come from the ambiguous image parts and have the maximal potentials to reduce label uncertainty. Users can continuously refine their inputs following these suggestive strokes. In this way, the number of user-program interaction iterations can thus be greatly reduced. Specifically, the uncertainty is modeled by mutual information between user strokes and unlabeled image regions. To ensure that ActiveCut works at a user-interactive rate, we adopt superpixel lattice based image representation, whose computation depends on scene complexity rather than original image resolution. Moreover, it retains the 2-D-lattice topology and is thus more suitable for parallel computing. While for the most time-consuming calculation of probabilistic entropy, variational approximation is utilized for acceleration. Finally, based on submodular function theory, we provide a theoretic analysis for the performance lower bound of the proposed greedy algorithm. Various user studies are conducted on the MSRC image dataset to - - validate the effectiveness of our proposed algorithm. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Video Précis: Highlighting Diverse Aspects of Videos

    Page(s): 853 - 868
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3677 KB) |  | HTML iconHTML  

    Summarizing long unconstrained videos is gaining importance in surveillance, web-based video browsing, and video-archival applications. Summarizing a video requires one to identify key aspects that contain the essence of the video. In this paper, we propose an approach that optimizes two criteria that a video summary should embody. The first criterion, “coverage,” requires that the summary be able to represent the original video well. The second criterion, “diversity,” requires that the elements of the summary be as distinct from each other as possible. Given a user-specified summary length, we propose a cost function to measure the quality of a summary. The problem of generating a précis is then reduced to a combinatorial optimization problem of minimizing the proposed cost function. We propose an efficient method to solve the optimization problem. We demonstrate through experiments (on KTH data, unconstrained skating video, a surveillance video, and a YouTube home video) that optimizing the proposed criterion results in meaningful video summaries over a wide range of scenarios. Summaries thus generated are then evaluated using both quantitative measures and user studies. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic FEC Algorithms for TFRC Flows

    Page(s): 869 - 885
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1278 KB) |  | HTML iconHTML  

    Media flows coexist with TCP-based data traffic on the Internet and are required to be TCP-friendly. The TCP protocol slowly increases its sending rate until episodes of congestion occur, and then it quickly reduces its rate to remove congestion. However, media flows can be sensitive to even brief episodes of congestion. In this paper, we are interested in protecting media flows from TCP-induced congestion while maintaining their TCP friendliness. In particular, we consider media flows carried over the TCP-Friendly Rate Control (TFRC) protocol and we design algorithms that dynamically adapt the level of forward error correction (FEC) based on the congestion state of the network. To this end, first, we investigate the loss and delay characteristics of TFRC flows in several TCP-induced congestion scenarios, and we develop novel predictors of loss events based on packet delay information. Second, we use these predictors to dynamically adapt the level of FEC protection based on the predicted level of congestion. We show that this technique can significantly improve the overhead versus reliability trade-off compared to fixed FEC. Third, we select the FEC and original media packets within each FEC block, in a rate-distortion optimized way, and we show that this technique significantly improves media quality. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multihop Packet Delay Bound Violation Modeling for Resource Allocation in Video Streaming Over Mesh Networks

    Page(s): 886 - 900
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2006 KB) |  | HTML iconHTML  

    Resource allocation plays a critical role in multisession video streaming over mesh networks to maximize the overall video presentation quality under transmission delay and network resource constraints. A critical component in efficient resource allocation is to analyze and model the multihop queuing behavior along the transmission path, estimate the packet loss ratio due to delay bound violation, and predict the amount of video quality degradation after multihop video transmission. In this work, we develop a multihop packet delay bound violation model to predict the packet loss probability and end-to-end distortion for video streaming over multihop networks. To this end, we extract salient features to characterize the input source and network conditions of links along the transmission path and construct a learning-based model using artificial neural network (ANN). Based on this model, we then formulate the resource allocation into a nonconvex optimization problem which aims to minimize the overall video distortion while maintaining fairness between sessions. We solve this optimization problem using Lagrangian duality methods. Extensive experimental results demonstrate that, with this widely-used offline-training-online-estimation mechanism, the proposed model is potentially applicable to almost all network conditions and can provide fairly accurate estimation results as compared with other models with a given sample data set. The proposed optimization algorithm achieves more efficient resource allocation than existing schemes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • TURINstream: A Totally pUsh, Robust, and effIcieNt P2P Video Streaming Architecture

    Page(s): 901 - 914
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (662 KB) |  | HTML iconHTML  

    This paper presents TURINstream, a novel P2P video streaming architecture designed to jointly achieve low delay, robustness to peer churning, limited protocol overhead, and quality-of-service differentiation based on peers cooperation. Separate control and video overlays are maintained by peers organized in clusters that represent sets of collaborating peers. Clusters are created by means of a distributed algorithm and permit the exploitation of the participant nodes upload capacity. The video is conveyed with a push mechanism by exploiting the advantages of multiple description coding. TURINstream design has been optimized through an event driven overlay simulator able to scale up to tens of thousands of peers. A complete prototype of TURINstream has been developed, deployed, and tested on PlanetLab. We tested our prototype under varying degree of peer churn, flash crowd arrivals, sudden massive departures, and limited upload bandwidth resources. TURINstream fulfills our initial design goals, showing low average connection, startup, and playback delays, high continuity index, low control overhead, and effective quality-of-service differentiation in all tested scenarios. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • List of Reviewers

    Page(s): 915 - 917
    Save to Project icon | Request Permissions | PDF file iconPDF (26 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Multimedia EDICS

    Page(s): 918
    Save to Project icon | Request Permissions | PDF file iconPDF (16 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Multimedia Information for authors

    Page(s): 919 - 920
    Save to Project icon | Request Permissions | PDF file iconPDF (46 KB)  
    Freely Available from IEEE
  • Call for papers-Special Issue on Robust Measures and tests using Sparse Data for Detection and Elimination

    Page(s): 921
    Save to Project icon | Request Permissions | PDF file iconPDF (140 KB)  
    Freely Available from IEEE
  • 2010 Index IEEE Transactions on Multimedia Vol. 12

    Page(s): 922 - 932
    Save to Project icon | Request Permissions | PDF file iconPDF (106 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Multimedia society information

    Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (27 KB)  
    Freely Available from IEEE

Aims & Scope

The scope of the Periodical is the various aspects of research in multimedia technology and applications of multimedia.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Chang Wen Chen
State University of New York at Buffalo