By Topic

Image Analysis for Multimedia Interactive Services, 2009. WIAMIS '09. 10th Workshop on

Date 6-8 May 2009

Filter Results

Displaying Results 1 - 25 of 79
  • [Front cover]

    Page(s): i
    Save to Project icon | Request Permissions | PDF file iconPDF (1140 KB)  
    Freely Available from IEEE
  • [Copyright notice]

    Page(s): ii
    Save to Project icon | Request Permissions | PDF file iconPDF (32 KB)  
    Freely Available from IEEE
  • BIlinear Decomposition of 3-D face images: An application to facial expression recognition

    Page(s): 1 - 4
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (203 KB) |  | HTML iconHTML  

    This paper describes a novel technique for decoupling two of the main sources of variation in 3-D facial structure, the subject's identity and expression. Decoupling and controlling independently these factors is a key step in many practical applications and in this work it is achieved by modeling the face manifold with a bilinear model. Bilinear modeling, however, can only be applied to vectors, and therefore a vector representation for each face is established first. To this end, we use a generic face model that is fitted to each face under the constraint that anatomical points get aligned. The effectiveness and applicability of the proposed method is demonstrated with an application to facial expression recognition. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Learning action descriptors for recognition

    Page(s): 5 - 8
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (360 KB) |  | HTML iconHTML  

    This paper evaluates different Restricted Boltzmann Machines models in unsupervised, semi-supervised and supervised frameworks using information from human actions. After feeding these multilayer models with low level features, we infer high-level discriminating features that highly improve the classification performance. This approach eliminates the difficult process of selecting good mid-level feature descriptors, changing the feature selection and extraction process by a learning stage. Two main contributions are presented. First, a new sequence-descriptor from accumulated histograms of optical flow (aHOF) is presented. Second, comparative results using unsupervised, supervised and semi-supervised classification experiments are shown. The results show that the RBM architectures provide very good results in our classification task and present very good properties for semi-supervised learning. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Face tracking using a region-based mean-shift algorithm with adaptive object and background models

    Page(s): 9 - 12
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2063 KB) |  | HTML iconHTML  

    This paper proposes a technique for face tracking based on the mean shift algorithm and the segmentation of the images into regions homogeneous in color. Object and background are explicitly modeled and updated through the tracking process. Color and shape information are used to define with precision the face contours, providing a mechanism to adapt the tracker to variations in object scale and to illumination and background changes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Markerless human motion capture and pose recognition

    Page(s): 13 - 16
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (240 KB) |  | HTML iconHTML  

    In this paper, we present an approach to capture markerless human motion and recognize human poses. Different body parts such as the torso and the hands are segmented from the whole body and tracked over time. A 2D model is used for the torso detection and tracking, while a skin color model is utilized for the hands tracking. Moreover, 3D location of these body parts are calculated and further used for pose recognition. By transferring the 2D and 3D coordinates of the torso and both hands into normalized feature space, simple classifiers, such as the nearest mean classifier, are sufficient for recognizing predefined key poses. The experimental results show that the proposed approach can effectively detect and track the torso and both hands in video sequences. Meanwhile, the extracted feature points are used for pose recognition and give good classification results of the multi-class problem. The implementation of the proposed approach is simple, easy to realize, and suitable for real gaming applications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploiting visual reranking to improve pseudo-relevance feedback for spoken-content-based video retrieval

    Page(s): 17 - 20
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (466 KB) |  | HTML iconHTML  

    In this paper we propose an approach that utilizes visual features and conventional text-based pseudo-relevance feedback (PRF) to improve the results of semantic-theme-based video retrieval. Our visual reranking method is based on an Average Item Distance (AID) score. AID-based visual reranking is designed to improve the suitability of items at the top of the initial results list, i.e., those feedback items selected for use in query expansion. Our method is intended to help target feedback items representative of visual regularity typifying the semantic theme of the query. Experiments performed on the VideoCLEF 2008 data set and on a number of retrieval scenarios combining the inputs from speech-transcript-based (i.e., text-based) search and visual reranking demonstrate the benefits of using AID-based visual representatives to compensate for the inherent problems of PRF, such as topic drift. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multiple description scalable coding for video streaming

    Page(s): 21 - 24
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (173 KB) |  | HTML iconHTML  

    Scalable video coding and multiple description video coding are two effective approaches to deal with the problems posed by video streaming over error-prone packet switched networks. In this paper, a multiple description scalable video coder based on the scalable video extension of H.264/AVC is presented. The proposed method downsamples the residual data of the spatial base layer to generate two descriptions. The produced descriptions have the same enhancement layers and the same motion vectors. Therefore, drift is avoided in case one description gets lost and the enhancement layers are received. Performance is evaluated through simulation experiments with different packet loss rates. The experimental results show that our proposed scheme achieves improvements of 3 dB or more when compared to single description scalable video coding at 10% packet loss rate. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Feature-based video key frame extraction for low quality video sequences

    Page(s): 25 - 28
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (260 KB) |  | HTML iconHTML  

    We present an approach to key frame extraction for structuring user generated videos on video sharing Websites (e. g. YouTube). Our approach is intended to link existing image search engines to video data. User generated videos are, contrary to professional material, unstructured, do not follow any fixed rule, and their camera work is poor. Furthermore, the coding quality is bad due to low resolution and high compression. In a first step, we segment video sequences into shots by detecting gradual and abrupt cuts. Further, longer shots are segmented into subshots based on location and camera motion features. One representative key frame is extracted per subshot using visual attention features, such as lighting, camera motion, face, and text appearance. These key frames are useful for indexing and for searching similar video sequences using MPEG-7 descriptors. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimizing strategies for the exploration of social networks and associated data collections

    Page(s): 29 - 32
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (438 KB) |  | HTML iconHTML  

    Multimedia data collections immersed into social networks may be explored from the point of view of varying documents and users characteristics. In this paper, we develop a unified model to embed documents and users into coherent structures from which to extract optimal subsets. The result is the definition of guiding navigation strategies of both the user and document networks, as a complement to classical search operations. An initial interface that may materialize such browsing over documents is demonstrated in the context of cultural heritage. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Context awareness in graph-based image semantic segmentation via visual word distributions

    Page(s): 33 - 36
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (370 KB) |  | HTML iconHTML  

    This paper addresses the problem of image semantic segmentation (or semantic labelling), that is the association of one of a predefined set of semantic categories (e.g. cow, car, face) to each image pixel. We adopt a patch-based approach, in which super-pixel elements are obtained via oversegmentation of the original image. We then train a conditional random field on heterogeneous descriptors extracted at different scales and locations. This discriminative graphical model can effectively account for the statistical dependence of neighbouring patches. For the more challenging task of considering long-range patch dependency and contextualisation, we propose the use of a descriptor based on histograms of visual words extracted in the vicinity of each patch at different scales. Experiments validate our approach by showing improvements with respect to both a base model not using distributed features and the state of the art works in the area. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Full action instances for motion analysis

    Page(s): 37 - 40
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (191 KB) |  | HTML iconHTML  

    Motion analysis is an important component of surveillance, video annotation and many other applications. Current work focuses on the tracking of moving entities, the representation of their actions and the classification of sequences. A wide range of methods are available for the characterization and analysis of human activity. This work presents an original approach for the detailed characterization of activity in a video sequence. A novel framework for encoding and extracting representative, repeating segments of activities is presented, resulting in ldquoFull Action Instancesrdquo. We focus on the analysis of human activities, however the proposed algorithm can be extended to more general categories of action that contains repetitive components, due to its general design. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Detection of pan and zoom in soccer sequences based on H.264/AVC motion information

    Page(s): 41 - 44
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1357 KB) |  | HTML iconHTML  

    Unsupervised detection of pan and zoom in soccer sequences allows automatic classification of shots and match analysis. In this work we propose a pan and zoom (both in and out) detector specifically designed for low resolution soccer sequences. Our implementation is based on the analysis of the distribution of the motion vectors, already available in the encoded sequence, among a specific subset of reliable MBs, selected by means of inexpensive image preprocessing. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Coarse-to-fine moving region segmentation in compressed video

    Page(s): 45 - 48
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1115 KB) |  | HTML iconHTML  

    In this paper, we propose a coarse-to-fine segmentation method for extracting moving regions from compressed video. First, motion vectors are clustered to provide a coarse segmentation of moving regions at block level. Second, boundaries between moving regions are identified, and finally, a fine segmentation is performed within boundary regions using edge and color information. Experimental results show that the proposed method can segment moving regions fairly accurately, with sensitivity of 85% or higher, and specificity of over 95%. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Evaluation of pixel- and motion vector-based global motion estimation for camera motion characterization

    Page(s): 49 - 52
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (137 KB) |  | HTML iconHTML  

    Pixel-based and motion vector-based global motion estimation (GME) techniques are evaluated in this paper with an automatic system for camera motion characterization. First, the GME techniques are compared with a frame-by-frame PNSR measurement using five video sequences. The best motion vector-based GME method is then evaluated together with a common and a simplified pixel-based GME technique for camera motion characterization. For this, selected unedited videos from the TRECVid 2005 BBC rushes corpus are used. We evaluate how the estimation accuracy of global motion parameters affects the results for camera motion characterization in terms of retrieval measures. The results for this characterization show that the simplified pixel-based GME technique obtains results that are comparable with the common pixel-based GME method, and outperforms significantly the results of an earlier proposed motion vector-based GME approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Summarizing raw video material using Hidden Markov Models

    Page(s): 53 - 56
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (161 KB) |  | HTML iconHTML  

    Besides the reduction of redundancy the selection of representative segments is a core problem when summarizing collections of raw video material. We propose a novel approach for the selection of segments to be included in a video summary based on hidden Markov models (HMM), which are trained on an annotated subset of the content. The observations of the HMM are relevance judgments of content segments based on different visual features, the hidden states are the selection/non-selection of content segments. The HMM is designed to take all relevant scenes into account. We show that the approach generalizes well when trained on sufficiently diverse content. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Building summaries from web information sources

    Page(s): 57 - 60
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (163 KB) |  | HTML iconHTML  

    Document summarization techniques can be profitably used for automatic production and delivery of multimedia information. In this paper we describe a system for summarizing HTML documents (retrieved from the Internet) using several heuristic optimization criteria. An overview of the system and some preliminary results are described. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic metadata enrichment in news production

    Page(s): 61 - 64
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (252 KB) |  | HTML iconHTML  

    News production is characterized by complex and dynamic workflows in which it is important to produce and distribute news items as fast as possible. In this paper, we show how personalized distribution and consumption of news items can be enabled by automatically enriching news metadata with open linked datasets available on the web of data, thus providing a more pleasant experience to fastidious consumers where news content is presented within a broader historical context. Further we present a faceted browser that provides a convenient way for exploring news items based on an ontology of NewsML-G2 and rich semantic metadata. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Context-aware graph-based content representation for semantic navigation in multimedia news archives

    Page(s): 65 - 68
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (474 KB) |  | HTML iconHTML  

    Representing the content relations in multimedia data plays a crucial role for the delivery of information navigation services. However, current tools for information extraction and visualisation are still far to be satisfactory. This paper addresses this task providing an unsupervised framework for graph-based multimedia news content representation. The system uses hybrid clustering and a graph partitioning technique to aggregate semantically related multimedia news from the Web and TV. These aggregations are represented by oriented graphs at increasing level of abstraction. Such an information can be accessed and browsed through a Web interface. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic multimedia annotation through kernel combinations

    Page(s): 69 - 72
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (128 KB) |  | HTML iconHTML  

    An image classification model is here presented based on the integration of visual and textual properties supported by complex kernel functions. Linguistic descriptions derived through Information Extraction from Web pages are here integrated with the visual features corresponding to the images, according to independent kernel combinations. The impact of dimensionality reduction methods (i.e. LSA) and of proper combinations of redundant feature descriptions is also presented. The resulting workflow is largely applicable as the comparative evaluation discussed here confirms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimized scalable Multiple-Description Coding and FEC-based Joint Source-Channel Coding: A performance comparison

    Page(s): 73 - 76
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (303 KB) |  | HTML iconHTML  

    This paper proposes an optimized, low-complexity methodology that combines joint source-channel coding (JSCC) based on forward error correction (FEC) coding and scalable multiple description coding (MDC) mechanisms for error-resilient data transmission over error-prone packet-based channels. Additionally, we report a comparative theoretical analysis between (i) MDC, and FEC-based scalable JSCC for sources encoded using (ii) single description coding (SDC) and (iii) MDC. Our analysis assumes ideal source coders that can achieve the theoretical performance bounds for a Gaussian source. This source model is independent of the actual implementation and therefore allows to draw conclusions on the theoretical performance of these approaches and to determine which methodology should be employed in specific transmission scenarios. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A fusion method for multispectral and panchromatic images based on HSI and Contourlet transformation

    Page(s): 77 - 80
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (798 KB) |  | HTML iconHTML  

    Fusion of multispectral and panchromatic remote sensing images is a procedure to obtain spatial resolution and quality of the panchromatic image as well as preserving spectral information of the multispectral image. In this paper, we present a new fusion method based on HSI (Hue-Saturation-Intensity) and Contourlet transform. First, we convert the multispectral image from the RGB color space into the HSI color space. Then, by applying Contourlet transform to the panchromatic image and the I component of the multispectral image, we utilize an improved fusion rule based on PCA for the low-frequency sub-images, and engage the maximum fusion rule for the high-frequency sub-images. Finally, a fusion image is obtained by the inverse HSI transform. The experimental results show that the proposed fusion method not only enhances the spatial resolution of the fusion image, but also preserves the spectral information of the original multispectral image. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Autonomous production of basketball videos from multi-sensored data with personalized viewpoints

    Page(s): 81 - 84
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (637 KB) |  | HTML iconHTML  

    We propose an autonomous system for personalized production of basketball videos from multi-sensored data under limited display resolution. We propose criteria for optimal planning of viewpoint coverage and camera selection for improved story-telling and perceptual comfort. By using statistical inference, we design and implement the estimation process. Experiments are made to verify the system, which shows that our method efficiently alleviates flickering visual artifacts due to viewpoint switching, and discontinuous story-telling artifacts. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • PoHMM-based human action recognition

    Page(s): 85 - 88
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (194 KB) |  | HTML iconHTML  

    In this paper we approach the human action recognition task using the Product of Hidden Markov Models (PoHMM). This approach allow us to get large state-space models from the normalized product of several simple HMMs. We compare this mixed graphical model with other directed multi-chain models like Coupled Hidden Markov Model (CHMM) or Factorial Hidden Markov Model (FHMM), so as with Conditional Random Field (CRF), a particular case of undirected graphical models. Our results show that PoHMM outperforms the classification score of these other space-state models on the KTH database using optical flow features. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimizing IPTV video delivery using SVC spatial scalability

    Page(s): 89 - 92
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (130 KB) |  | HTML iconHTML  

    This paper focuses on two typical constraints present in current video delivery platforms such as IPTV: accommodating the diversity of consumer end-devices and coping with various bandwidth constraints of the deployed network access technologies. We propose the use of the scalable video coding (SVC) extension of H.264/AVC to cope with these two constraints: three different techniques to combine SD and HD versions of a video sequence in one SVC stream are presented and we assess how quality scalability compares to spatial scalability in producing video streams that can be tailored to various bandwidth constraints. The results show that the three proposed techniques for combining SD and HD perform equally good and that, under certain conditions, spatial scalability outperforms quality scalability to cope with various bandwidth constraints. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.