By Topic

Multimedia and Expo (ICME), 2012 IEEE International Conference on

Date 9-13 July 2012

Filter Results

Displaying Results 1 - 25 of 202
  • [Back cover]

    Publication Year: 2012 , Page(s): C4
    Save to Project icon | Request Permissions | PDF file iconPDF (565 KB)  
    Freely Available from IEEE
  • [Title page i]

    Publication Year: 2012 , Page(s): i
    Save to Project icon | Request Permissions | PDF file iconPDF (61 KB)  
    Freely Available from IEEE
  • [Title page iii]

    Publication Year: 2012 , Page(s): iii
    Save to Project icon | Request Permissions | PDF file iconPDF (122 KB)  
    Freely Available from IEEE
  • [Copyright notice]

    Publication Year: 2012 , Page(s): iv
    Save to Project icon | Request Permissions | PDF file iconPDF (110 KB)  
    Freely Available from IEEE
  • Table of contents

    Publication Year: 2012 , Page(s): v - xviii
    Save to Project icon | Request Permissions | PDF file iconPDF (186 KB)  
    Freely Available from IEEE
  • Message from the ICME 2012 General Chairs

    Publication Year: 2012 , Page(s): xix - xx
    Save to Project icon | Request Permissions | PDF file iconPDF (175 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • Message from the ICME 2012 Technical Program Chairs

    Publication Year: 2012 , Page(s): xxi - xxii
    Save to Project icon | Request Permissions | PDF file iconPDF (245 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • Organizing Committee

    Publication Year: 2012 , Page(s): xxiii - xxiv
    Save to Project icon | Request Permissions | PDF file iconPDF (137 KB)  
    Freely Available from IEEE
  • Steering Committee

    Publication Year: 2012 , Page(s): xxv
    Save to Project icon | Request Permissions | PDF file iconPDF (129 KB)  
    Freely Available from IEEE
  • Technical tracks

    Publication Year: 2012 , Page(s): xxvi
    Save to Project icon | Request Permissions | PDF file iconPDF (132 KB)  
    Freely Available from IEEE
  • Reviewers

    Publication Year: 2012 , Page(s): xxvii - xxxiv
    Save to Project icon | Request Permissions | PDF file iconPDF (118 KB)  
    Freely Available from IEEE
  • Sponsors

    Publication Year: 2012 , Page(s): xxxv
    Save to Project icon | Request Permissions | PDF file iconPDF (164 KB)  
    Freely Available from IEEE
  • Toward transparent telepresence systems

    Publication Year: 2012 , Page(s): xxxvi - xli
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (136 KB)  

    Summary form only given. Dreams of telepresence are fed by special effects in movies, on stage, and even in mainstream news programs. These illusions may satisfy most passive viewers, but do not work for the actual distant participants. Even today's best "Telepresence" systems have difficulty supporting such simple capabilities as eye contact and gaze awareness among these multiple distant participants. This talk will review some component technologies needed to achieve natural--some would say "transparent"--telepresence (3D acquisition, tracking, rendering, 3D display), will present some recent progress, and will outline several promising future directions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Hierarchical Model for Human Interaction Recognition

    Publication Year: 2012 , Page(s): 1 - 6
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (628 KB) |  | HTML iconHTML  

    Recognizing human interactions is a challenging task due to partially occluded body parts and motion ambiguities in interactions. We observe that the interdependencies existing at both action level and body part level greatly help disambiguate similar individual movements and facilitate human interaction recognition. In this paper, we propose a novel hierarchical model to capture such interdependencies for recognizing interactions of two persons. We model the action of each person by a large-scale global feature and several body part features. Two types of contextual information are exploited in our model to capture the implicit and complex interdependencies between interaction class, the action classes of two persons and the labels of persons' body parts. We build a challenging human interaction dataset to test our method. Results show that our model is quite effective in recognizing human interactions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Social Image Tagging by Mining Sparse Tag Patterns from Auxiliary Data

    Publication Year: 2012 , Page(s): 7 - 12
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1144 KB) |  | HTML iconHTML  

    User-given tags associated with social images from photosharing websites (e.g., Flickr) are valuable auxiliary resources for the image tagging task. However, social images often suffer from noisy and incomplete tags, heavily degrading the effectiveness of previous image tagging approaches. To alleviate the problem, we introduce a Sparse Tag Patterns (STP) model to discover noiseless and complementary cooccurrence tag patterns from large scale user contributed tags among auxiliary web data. To fulfill the compactness and discriminability, we formulate the STP model as a problem of minimizing quadratic loss function regularized by bi-layer ℓ1 norm. We treat the learned STP as a universal knowledge base and verify its superiority within a data-driven image tagging framework. Experimental results over 1 million auxiliary data demonstrate superior performance of the proposed method compared to the state-of-the-art. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Learning Global and Reconfigurable Part-Based Models for Object Detection

    Publication Year: 2012 , Page(s): 13 - 18
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1873 KB) |  | HTML iconHTML  

    This paper presents a method of learning global and reconfigurable part-based models (RPM) for object detection. Recently, deformable part-based model (DPM) is widely used. A DPM consists of a root node and a collection of part nodes, which is learned under the latent SVM formulation by treating part nodes as hidden variables. Although the configuration of parts (i.e., the shapes, sizes and locations of parts) plays a major role in improving performance of object detection, it has not been addressed well in the literature. In this paper, we propose RPM to tackle it. A dictionary of part types is defined by enumerating rectangular shapes of different aspect ratios and sizes given the whole lattice (often at twice resolution of the root node), and each part type has a set of part instances when placed in the lattice. So, the configuration space of parts is quantized by the part types and part instances, and then organized into a hierarchical And-Or directed a cyclic graph (AOG). The AOG consists of three types of nodes: terminal nodes (i.e., part instances), And-nodes (representing decompositions of a part instance into two smaller ones) and Or-nodes (representing alternative ways of decompositions). The globally optimal configuration in the AOG is solved using dynamic programming (DP) where the classification error rates of terminal nodes and And-nodes are used as their figures of merit. In experiments, we test our method on the 20 object categories in the PASCAL VOC2007 dataset and obtain comparable performance with state-of-the-art methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Spiking and Blocking Events Detection and Analysis in Volleyball Videos

    Publication Year: 2012 , Page(s): 19 - 24
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (885 KB) |  | HTML iconHTML  

    In volleyball matches, spiking is the most effective way to gain points, while blocking is the action to prevent the opponents from getting scores by spiking. In this paper, we propose an intelligent system for automatic spiking events detection and blocking pattern classification in real volleyball videos. First, the entire videos are segmented into clips of rallies by whistle detection. Then, we find the court region based on proper camera calibration, and detect the location of the net for judging the positions of spiking and blocking. Via analyzing the changes of moving pixels along the net, we make a bounding box around the blocking location, so as to classify the blocking patterns into two main categories based on the width of bounding box. Finally, two important tactic patterns, delayed spiking and alternate position spiking, are recognized. With the information of spiking events and blocking locations, we can collect the statistical data and make tactics inference easily. To the best of our knowledge, no previous work is focused on spiking or blocking event detection. The experimental results on the videos recorded by a university volleyball team are promising and demonstrate the effectiveness of our proposed scheme. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Recognition of Multiple-Food Images by Detecting Candidate Regions

    Publication Year: 2012 , Page(s): 25 - 30
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (917 KB) |  | HTML iconHTML  

    In this paper, we propose a two-step method to recognize multiple-food images by detecting candidate regions with several methods and classifying them with various kinds of features. In the first step, we detect several candidate regions by fusing outputs of several region detectors including Felzenszwalb's deformable part model (DPM) [1], a circle detector and the JSEG region segmentation. In the second step, we apply a feature-fusion-based food recognition method for bounding boxes of the candidate regions with various kinds of visual features including bag-of-features of SIFT and CSIFT with spatial pyramid (SP-BoF), histogram of oriented gradient (HoG), and Gabor texture features. In the experiments, we estimated ten food candidates for multiple-food images in the descending order of the confidence scores. As results, we have achieved the 55.8% classification rate, which improved the baseline result in case of using only DPM by 14.3 points, for a multiple-food image data set. This demonstrates that the proposed two-step method is effective for recognition of multiple-food images. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Discovering Social Photo Navigation Patterns

    Publication Year: 2012 , Page(s): 31 - 36
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (377 KB) |  | HTML iconHTML  

    In general, user browsing behavior has been examined within specific tasks (e.g., search), or in the context of particular web sites or services ( e.g., in shopping sites). However, with the growth of social networks and the proliferation of many different types of web services ( e.g., news aggregators, blogs, forums, etc.), the web can be viewed as an ecosystem in which a user's actions in a particular web service may be influenced by the service she arrived from ( e.g., are users browsing patterns similar if they arrive at a website via search or via links in aggregators?). In particular, since photos in services like Flickr are used extensively throughout the web, it is common for visitors to the site to arrive via links in many different types of web sites. In this paper, we depart from the hypothesis that visitors to social sites such as Flickr behave differently depending on where they come from. For this purpose, we analyze a large sample of Flickr user logs to discover social photo navigation patterns. More specifically, we classify pages within Flickr into different categories ( e.g., "add a friend page", "single photo page," etc.), and by clustering sessions discover important differences in social photo navigation that manifest themselves depending on the type of site users visit before visiting Flickr. Our work examines photo navigation patterns in Flickr for the first time taking into account the referrer domain. Our analysis is useful in that it can contribute to a better understanding of how people use photo services like Flickr, and it can be used to inform the design of user modeling and recommendation algorithms, among others. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Group Recommendation Using External Followee for Social TV

    Publication Year: 2012 , Page(s): 37 - 42
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (473 KB) |  | HTML iconHTML  

    Group recommendation plays a significant role in Social TV systems, where online friends form into temporary groups to enjoy watching video together and interact with each other. Online microblogging systems introduce the "following" relationship that reflects the common interests between users in a group and external representative followees outside the group. Traditional group recommendation only considers internal group members' preferences and their relationship. In our study, we measure the external followees' impact on group interest and establish group preference model based on external experts' guidance for group recommendation. In addition, we take advantage of the current watching video to improve context-aware recommendations. Experimental results show that our solution works much better in situations of high group dynamic and inactive group members than traditional approaches. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multimodal Location Estimation of Consumer Media: Dealing with Sparse Training Data

    Publication Year: 2012 , Page(s): 43 - 48
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (707 KB) |  | HTML iconHTML  

    This article describes a novel approach to the problem of associating geo-locations to consumer-produced multimedia data such as videos and photos that are publicly available on social networking websites such as Flickr. We specifically focus on the case where the available training data is sparse both in absolute numbers as well as geographic coverage when compared to the number of untagged query data. We develop a novel graphical model based framework for the problem of interest and pose the problem of geotagging as one of inference over this graph. The novelty of our algorithm lies in the fact that we jointly estimate the geo-locations of all the query videos, which helps obtain performance improvements over existing algorithms in the literature that process each query video independently. Our system enables the query videos to act as "virtual" training data that effectively bootstrap the geo-tagging process. The quality of the database improves with each additional query video in the system. Further, our modeling provides a generic theoretical framework that can be used to incorporate any other available textual, visual or audio features. We evaluate our algorithm on the MediaEval 2011 Placing Task data set and show that for fixed training data the system performance improves with an increasing number of unlabeled test data. The performance gains are shown to be over 10% as compared to existing algorithms in the literature. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Empowering Cross-Domain Internet Media with Real-Time Topic Learning from Social Streams

    Publication Year: 2012 , Page(s): 49 - 54
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (584 KB) |  | HTML iconHTML  

    This paper aims to connect social media from disparate sources on the Internet by building a common topic space in-between, using which cross domain media recommendations can be realized on the web. The topic space is built and updated in real time by extending the Latent Dirichlet Allocation (LDA) model to cater to streaming online data. Our topical model, named Online Streaming LDA (OSLDA), is able to extract, learn, populate, and update the topic space in real time, scaling with streaming tweets. Based on the proposed topic space learned in real time, we present media recommendation applications that cannot be achieved by conventional media analysis techniques: (1) tweet enrichment by recommending related videos, and (2) popular video recommendation for featuring socially trending topical videos. We conduct experiments over a collection of 3.6 million tweets and 1.2 million click-through data from a video search engine. Our results show that the learned topic model plays a natural role connecting cross-domain social media, leading to a better user experience consuming social media. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Media Lifecycle and Content Analysis in Social Media Communities

    Publication Year: 2012 , Page(s): 55 - 60
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (877 KB) |  | HTML iconHTML  

    This paper examines the role of content analysis in media-rich online communities. We highlight changes in the multimedia generation and consumption process that has occurred the past decade, and discuss several new angles this has brought to multimedia analysis research. We first examine the content production, dissemination and consumption patterns in the recent social media studies literature. We then propose an updated conceptual summary of media lifecycle from a previous research column by Chang. We present an update list of impact criteria and challenge areas for multimedia content analysis. Among the three criteria, two are existing but with new problems and solutions, one is new as a results of the community-driven content lifecycle. We present three case studies that addresses the impact criteria, and conclude with an outlook for emerging problems. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Fast and Performance-Maintained Transcoding Method Based on Background Modeling for Surveillance Video

    Publication Year: 2012 , Page(s): 61 - 66
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (560 KB) |  | HTML iconHTML  

    Low-complexity and high-performance surveillance video Transcoding methods play an important role for a wide range of surveillance video transmission and storage applications. Towards this end, the special characteristics of surveillance video should be utilized for Transcoding. In this paper, we propose a fast and performance-maintained Transcoding method. This method firstly divides macro blocks (MBs) into foreground MBs, foreground border MBs and background MBs. Statistics show that the three categories have different distributions of prediction modes, motion vectors and reference frames. Following this, we adopt different Transco ding strategies in terms of removing the redundant prediction modes, narrowing motion search range and reducing reference frames. In particular, we propose an algorithm to exploit the decoded motion vector to adaptively calculate motion search range. Experimental results show that, compared with the recent background modeling based full-decoding-full-encoding, our Transcoding method saves more than 93% time with ignorable quality loss. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Unified Estimation-Theoretic Framework for Error-Resilient Scalable Video Coding

    Publication Year: 2012 , Page(s): 67 - 72
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (209 KB) |  | HTML iconHTML  

    A novel scalable video coding (SVC) scheme is proposed for video transmission over loss networks, which builds on an estimation-theoretic (ET) framework for optimal prediction and error concealment, given all available information from both the current base layer and prior enhancement layer frames. It incorporates a recursive end-to-end distortion estimation technique, namely, the spectral coefficient-wise optimal recursive estimate (SCORE), which accounts for all ET operations and tracks the first and second moments of decoder reconstructed transform coefficients. The overall framework enables optimization of ET-SVC systems for transmission over lossy networks, while accounting for all relevant conditions including the effects of quantization, channel loss, concealment, and error propagation. It thus resolves longstanding difficulties in combining truly optimal prediction and concealment with optimal end-to-end distortion and error-resilient SVC coding decisions. Experiments demonstrate that the proposed scheme offers substantial performance gains over existing error-resilient SVC systems, under a wide range of packet loss and bit rates. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.