By Topic

Image Analysis for Multimedia Interactive Services (WIAMIS), 2012 13th International Workshop on

Date 23-25 May 2012

Filter Results

Displaying Results 1 - 25 of 37
  • 2012 13th International Workshop on Image Analysis for Multimedia Interactive Services - WIAMIS 2012 [front cover]

    Publication Year: 2012 , Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (30 KB)  
    Freely Available from IEEE
  • 2012 13th International Workshop on Image Analysis for Multimedia Interactive Services [Copyright notice]

    Publication Year: 2012 , Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (39 KB)  
    Freely Available from IEEE
  • Welcome to WIAMIS 2012

    Publication Year: 2012 , Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (72 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • Organizing committee

    Publication Year: 2012 , Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (75 KB)  
    Freely Available from IEEE
  • Improving identification by pruning: A case study on face recognition and body soft biometric

    Publication Year: 2012 , Page(s): 1 - 4
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (149 KB) |  | HTML iconHTML  

    We investigate body soft biometrics capabilities to perform pruning of a hard biometrics database improving both retrieval speed and accuracy. Our pre-classification step based on anthropometric measures is elaborated on a large scale medical dataset to guarantee statistical meaning of the results, and tested in conjunction with a face recognition algorithm. Our assumptions are verified by testing our system on a chimera dataset. We clearly identify the trade off among pruning, accuracy, and mensuration error of an anthropomeasure based system. Even in the worst case of ±10% biased anthropometric measures, our approach improves the recognition accuracy guaranteeing that only half database has to be considered. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A variational statistical framework for clustering human action videos

    Publication Year: 2012 , Page(s): 1 - 4
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (180 KB) |  | HTML iconHTML  

    In this paper, we present an unsupervised learning method, based on the finite Dirichlet mixture model and the bag-of-visual words representation, for categorizing human action videos. The proposed Bayesian model is learned through a principled variational framework. A variational form of the Deviance Information Criterion (DIC) is incorporated within the proposed statistical framework for evaluating the correctness of the model complexity (i.e. number of mixture components). The effectiveness of the proposed model is illustrated through empirical results. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Interactive segmentation and tracking of video objects

    Publication Year: 2012 , Page(s): 1 - 4
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1170 KB) |  | HTML iconHTML  

    This paper describes a mechanism to interactively segment objects from a sequence of video frames. The extracted object can be later embedded in a different background, associated to local scale metadata or used to train an automatic object detector. The workflow requires the interaction of the user at two stages: the temporal segmentation of the frames containing the object and the generation of an object mask to initialize a video tracker. The mask is defined as a combination of regions generated by an image segmentation algorithm. This framework has been integrated in an annotation tool available to the public. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Visualisation of tennis swings for coaching

    Publication Year: 2012 , Page(s): 1 - 4
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4110 KB) |  | HTML iconHTML  

    As a proficient tennis swing is a key element of success in tennis, many amateur tennis players spend a considerable amount of time and effort perfecting their tennis stroke mechanics, hoping to create more accuracy, consistency and power in their swing. In order to achieve these aims effectively a number of independent aspects of technique need to be addressed, including forming a correct racket grip, shot timing, body orientation and precise follow-through. Outside of a one-to-one coaching scenario, where constant professional feedback on technique can be provided, keeping all aspects of technique in mind can overwhelm amateur players. In this work, we have developed a set of visualisation tools to augment the development of amateur tennis players between dedicated one-to-one coaching sessions in the area of technique, timing and body posture. Our approach temporally aligns an amateur player's swing dynamics with that of an elite athlete, allowing direct technique comparison using augmented reality techniques. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Visual saliency estimation for video

    Publication Year: 2012 , Page(s): 1 - 4
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1228 KB) |  | HTML iconHTML  

    The most eye catching regions within an image or video can be captured by exploiting characteristics within the human visual system. In this paper we propose a novel method for modeling the visual saliency information in a video sequence. The proposed method incorporates wavelet decomposition and the modeling of the human visual system to capture spatiotemporal saliency information. A unique approach to capture and combine salient motion data with spatial intensity and orientation contrasts in the sequence, is presented. The proposed method shows a superior performance compared to the state-of-the-art existing methods. The fast algorithm can be simply implemented and is useful for many wavelet based applications such as watermarking, compression and fusion. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Social event discovery by topic inference

    Publication Year: 2012 , Page(s): 1 - 4
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (364 KB) |  | HTML iconHTML  

    With the keen interest of people for social media sharing websites the multimedia research community faces new challenges and compelling opportunities. In this paper, we address the problem of discovering specific events from social media data automatically. Our proposed approach assumes that events are conjoint distribution over the latent topics in a given place. Based on this assumption, topics are learned from large amounts of automatically collected social data using a LDA model. Then, event distribution estimation over a topic is solved using least mean square optimization. We evaluate our methods on locations scattered around the world and show via our experimental results that the proposed framework offers promising performance for detecting events based on social media. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An intelligent depth-based obstacle detection system for visually-impaired aid applications

    Publication Year: 2012 , Page(s): 1 - 4
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1702 KB) |  | HTML iconHTML  

    In this paper, we present a robust depth-based obstacle detection system in computer vision. The system aims to assist the visually-impaired in detecting obstacles with distance information for safety. With analysis of the depth map, segmentation and noise elimination are adopted to distinguish different objects according to the related depth information. Obstacle extraction mechanism is proposed to capture obstacles by various object proprieties revealing in the depth map. The proposed system can also be applied to emerging vision-based mobile applications, such as robots, intelligent vehicle navigation, and dynamic surveillance systems. Experimental results demonstrate the proposed system achieves high accuracy. In the indoor environment, the average detection rate is above 96.1%. Even in the outdoor environment or in complete darkness, 93.7% detection rate is achieved on average. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A skeleton based binarization approach for video text recognition

    Publication Year: 2012 , Page(s): 1 - 4
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (808 KB) |  | HTML iconHTML  

    Text in video data comes in different resolutions and with heterogeneous background resulting in difficult contrast ratios that most times prohibit valid OCR (Optical Character Recognition) results. Therefore, the text has to be separated from its background before applying standard OCR process. This pre-processing task can be achieved by a suitable image binarization procedure. In this paper, we propose a novel binarization method for video text images with complex background. The proposed method is based on a seed-region growing strategy. First, the text gradient direction is approximated by analyzing the content distribution of image skeleton maps. Then, the text seed-pixels are selected by calculating the average grayscale value of skeleton pixels. And finally, an automated seed region growing algorithm is applied to obtain the text pixels. The accuracy of the proposed approach is shown by evaluation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Social recommendation using speech recognition: Sharing TV scenes in social networks

    Publication Year: 2012 , Page(s): 1 - 4
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (294 KB) |  | HTML iconHTML  

    We describe a novel system which simplifies recommendation of video scenes in social networks, thereby attracting a new audience for existing video portals. Users can select interesting quotes from a speech recognition transcript, and share the corresponding video scene with their social circle with minimal effort. The system has been designed in close cooperation with the largest German public broadcaster (ARD), and was deployed at the broadcaster's public video portal. A twofold adaptation strategy adapts our speech recognition system to the given use case. First, a database of speaker-adapted acoustic models for the most important speakers in the corpus is created. We use spectral speaker identification for detecting whether one of these speakers is speaking, and select the corresponding model accordingly. Second, we apply language model adaptation by exploiting prior knowledge about the video category. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Structural based side information creation with improved matching criteria for Wyner-Ziv video coding

    Publication Year: 2012 , Page(s): 1 - 4
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (224 KB) |  | HTML iconHTML  

    The Wyner-Ziv video coding (WZVC) efficiency is highly dependent on the quality of the side information (SI) created at the decoder, typically through motion compensated frame interpolation (MCFI) techniques. Since the decoder only has available some reference decoded frames, SI creation turns out to be a difficult problem in WZVC. In most the MCFI techniques available, the matching criterion used for motion estimation only takes into account the (mean of absolute) pixel differences within a block which is limiting for some content types. This paper proposes a structural based SI creation approach with improved matching criteria combining local image features, obtained from the histogram of oriented gradients (HOG), with a boundary continuity criterion and the typical pixel differences criterion. With the structural based SI creation, the motion estimation process becomes more robust, e.g. to illumination changes, thus improving the SI quality (up to 0.4 dB) and reducing the bitrate (up to 6 % in terms of the Bjontegaard metric) regarding a state-of-the-art MCFI solution. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improved B-slices DIRECT mode coding using motion side information

    Publication Year: 2012 , Page(s): 1 - 4
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (551 KB) |  | HTML iconHTML  

    The so-called DIRECT coding mode plays an important role in the RD performance of predictive video coding such as the H.264/AVC and MPEG-4 standards because there is typically a large probability that the DIRECT mode is selected in B-slices by the rate-distortion optimization (RDO) process. Although the current H.264/AVC DIRECT coding procedure exploits the motion vectors (MV) obtained from the reference frames in a rather effective way, it may still be improved by considering better motion information such as motion data derived by the side information (SI) creation process typical of distributed video coding. Therefore, this paper proposes an improved DIRECT coding mode for B-slices by efficiently exploiting some motion side information available at both the encoder and decoder. Experimental results show that the proposed improved DIRECT coding mode provides up to 8% bitrate saving or 0.46 dB PSNR improvement. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Why did you record this video? An exploratory study on user intentions for video production

    Publication Year: 2012 , Page(s): 1 - 4
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (162 KB) |  | HTML iconHTML  

    Why do people record videos and share them? While the question seems to be simple, user intentions have not yet been investigated for video production and sharing. A general taxonomy would lead to adapted information systems and multimedia interfaces tailored to the users' intentions. We contribute (1) an exploratory user study with 20 participants, examining the various facets of user intentions for video production and sharing in detail and (2) a novel set of user intention clusters for video production, grounded empirically in our study results. We further reflect existing work in specialized domains (i.e. video blogging and mobile phone cameras) and show that prevailing models used in other multimedia fields (e.g. photography) cannot be used as-is to reason about video recording and sharing intentions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Using a 3D cylindrical interface for image browsing to improve visual search performance

    Publication Year: 2012 , Page(s): 1 - 4
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1988 KB) |  | HTML iconHTML  

    In this paper we evaluate a 3D cylindrical interface that arranges image thumbnails by visual similarity for the purpose of image browsing. Through a user study we compare the performance of this interface to the performance of a common scrollable 2D list of thumbnails in a grid arrangement. Our evaluation shows that the 3D Cylinder interface enables significantly faster visual search and is the preferred search interface for the majority of tested users. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reconstruction for 3D immersive virtual environments

    Publication Year: 2012 , Page(s): 1 - 4
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (541 KB) |  | HTML iconHTML  

    The future of tele-conferencing is towards multi-party 3D Tele-Immersion (TI) and TI environments that can support realistic inter-personal communications and virtual interaction among participants. In this paper, we address two important issues, pertinent to TI environments. The paper focuses on techniques for the real-time, 3D reconstruction of moving humans from multiple Kinect devices. The off-line generation of real-life 3D scenes from visual data, captured by non-professional users is also addressed. Experimental results are provided that demonstrate the efficiency of the methods, along with an example of mixing real with virtual in a shared space. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reliability measure for propagation-based stereo matching

    Publication Year: 2012 , Page(s): 1 - 4
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2601 KB) |  | HTML iconHTML  

    Seed propagation-based stereo matching can help to reduce ambiguity occuring when a pixel from one image has different putative correspondents in the other one due to difficult areas (repetitive patterns, homogeneous areas, occlusions and depth discontinuities). They rely on previously computed matches (seeds) to reduce the size of the search area, and thus the number of candidates. One approach of these iterative methods selects the “best” seed at each iteration to prevent the propagation of errors. However, little attention has been brought to this best-first selection criterion for which a correlation score is usually employed. This value itself does not consider any ambiguity and is not well-suited to select the most reliable seed. Therefore, in this paper we introduce a reliability measure. It has the advantage of taking into account information from the other candidates, and leads, according to the provided experimental evaluation, to better results than the correlation score alone. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Stereo video completion for rig and artefact removal

    Publication Year: 2012 , Page(s): 1 - 4
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (10955 KB) |  | HTML iconHTML  

    Video reconstruction has become an important tool for rig and artefact removal in cinema postproduction. In this paper we are concerned with reconstructing stereo video material. We propose a method that builds on existing exemplar-based video inpainting techniques and includes a dedicated view consistency constraint. Within a constrained texture synthesis framework, we use reconstructed motion and inter-frame disparity vectors as guides for finding appropriate example source patches from parts of the sequence that minimise spatial and stereo discrepancies. We then introduce coherent patch sewing to reconstruct the missing region by stitching the source patches together. Compared to previous methods our results show increased spatial and view consistency. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Recovering quasi-real occlusion-free textures for facade models by exploiting fusion of image and laser street data and image inpainting

    Publication Year: 2012 , Page(s): 1 - 4
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (6613 KB) |  | HTML iconHTML  

    In this paper we present relevant results for the texturing of 3D urban facade models by exploiting the fusion of terrestrial multi-source data acquired by a Mobile Mapping System (MMS) and image inpainting. Current 3D urban facade models are often textured by using images that contain parts of urban objects that belong to the street. These urban objects represent in this case occlusions since they are located between the acquisition system and the facades. We show the potential use of georeferenced images and 3D point clouds that are acquired at street level by the MMS in generating occlusion-free facade textures. We describe a methodology for reconstructing quasi-real textures of facades that are highly occluded by wide frontal objects. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Geo-tagging online videos using semantic expansion and visual analysis

    Publication Year: 2012 , Page(s): 1 - 4
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (225 KB) |  | HTML iconHTML  

    The association of geographical tags to multimedia resources enables browsing and searching online multimedia repositories using geographical criteria, but millions of already online but non geo-tagged videos and images remain invisible to the eyes of this type of systems. This situation calls for the development of automatic geo-tagging techniques capable of estimating the location where a video or image was taken. This paper presents a bimodal geo-tagging system for online videos based on extracting and expanding the geographical information contained in the textual metadata and on visual similarity criteria. The performance of the proposed system is evaluated on the MediaEval 2011 Placing task data set, and compared against the participants in that workshop. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Who are the users of a video search system? Classifying a heterogeneous group with a profile matrix

    Publication Year: 2012 , Page(s): 1 - 4
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (313 KB) |  | HTML iconHTML  

    Formulating requirements for a video search system can be a challenging task when everyone is a possible user. This paper explores the possibilities of classifying users by creating a Profile Matrix, placing users on two axes: experience and goal-directedness. This enables us to describe the characteristics of the subgroups and investigate differences between the different groups. We created Profile Matrices by classifying 850 respondents of a survey regarding a requirements study for a video search system. We conclude that the Profile Matrix indeed enables us to classify subgroups of users and describe their characteristics. The current research is limited to descriptions of subgroups and analysis of differences between these subgroups. In the future, we want to research what these differences mean with regard to the users' performance and acceptance of a video search system and explore the use of a profile matrix for other types of search systems. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploiting gaze movements for automatic video annotation

    Publication Year: 2012 , Page(s): 1 - 4
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (961 KB) |  | HTML iconHTML  

    This paper proposes a framework for automatic video annotation by exploiting gaze movements during interactive video retrieval. In this context, we use a content-based video search engine to perform video retrieval, during which, we capture the user eye movements with an eye-tracker. We exploit these data by generating feature vectors, which are used to train a classifier that could identify shots of interest for new users. The queries submitted by new users are clustered in search topics and the viewed shots are annotated as relevant or non-relevant to the topics by the classifier. The evaluation shows that the use of aggregated gaze data can be utilized effectively for video annotation purposes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Student-t background modeling for persons' fall detection through visual cues

    Publication Year: 2012 , Page(s): 1 - 4
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (335 KB) |  | HTML iconHTML  

    This article presents a robust, real-time background subtraction algorithm able to operate properly in complex dynamically changing visual conditions and indoor/outdoor environments, based on a single, cheap monocular camera, like a webcam. This algorithm uses an image grid and models each pixel of the grid as a mixture of adaptive Student-t distributions. This approach makes this algorithm robust and efficient, in terms of computational cost and memory requirements, and thus suitable for large scale implementations. The proposed algorithm is applied in the problem of humans' fall detection that presents high complexity of visual content. Finally, the performances of this scheme and the scheme proposed in [1] by the same authors, are compared. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.