By Topic

Image Analysis for Multimedia Interactive Services, 2008. WIAMIS '08. Ninth International Workshop on

Date 7-9 May 2008

Filter Results

Displaying Results 1 - 25 of 74
  • [Title page i]

    Publication Year: 2008 , Page(s): i
    Save to Project icon | Request Permissions | PDF file iconPDF (145 KB)  
    Freely Available from IEEE
  • [Title page iii]

    Publication Year: 2008 , Page(s): iii
    Save to Project icon | Request Permissions | PDF file iconPDF (276 KB)  
    Freely Available from IEEE
  • [Copyright notice]

    Publication Year: 2008 , Page(s): iv
    Save to Project icon | Request Permissions | PDF file iconPDF (94 KB)  
    Freely Available from IEEE
  • Table of contents

    Publication Year: 2008 , Page(s): v - ix
    Save to Project icon | Request Permissions | PDF file iconPDF (164 KB)  
    Freely Available from IEEE
  • Welcome from the Chairs

    Publication Year: 2008 , Page(s): x
    Save to Project icon | Request Permissions | PDF file iconPDF (142 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • Committees

    Publication Year: 2008 , Page(s): xi - xiii
    Save to Project icon | Request Permissions | PDF file iconPDF (121 KB)  
    Freely Available from IEEE
  • Additional reviewers

    Publication Year: 2008 , Page(s): xiv
    Save to Project icon | Request Permissions | PDF file iconPDF (114 KB)  
    Freely Available from IEEE
  • Robust Person Detection for Surveillance Using Online Learning

    Publication Year: 2008 , Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (122 KB)  

    Recently, there has been considerable amount of research in methods for person detection. This talk will focus on methods for person detection in a surveillance setting (known environment). We will demonstrate that in this setting one can build robust and highly reliable person detectors by using on-line learning methods. In particular, I will first discuss ldquoconservative learningrdquo which is able to learn a person detector without any hand labelling effort. As a second example I will discuss a recently developed grid based person detector. The basic idea is to considerably simplify the detection problem by considering individual image locations separately. Therefore, we can use simple adaptive classifiers which are trained on-line. Due to the reduced complexity we can use a simple update strategy that requires only a few positive samples and is stable by design. This is an essential property for real world applications which require operation for 24 hours a day, 7 days a week. During the talk we will illustrate our results on video sequences and standard benchmark databases. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Unleashing Video Search

    Publication Year: 2008 , Page(s): 2
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (134 KB) |  | HTML iconHTML  

    Video is rapidly becoming a regular part of our digital lives. However, its tremendous growth is increasing userspsila expectations that video will be as easy to search as text. Unfortunately, users are still finding it difficult to find relevant content. And todaypsilas solutions are not keeping pace on problems ranging from video search to content classification to automatic filtering. In this talk we describe recent techniques that leverage the computerpsilas ability to effectively analyze visual features of video and apply statistical machine learning techniques to classify video scenes automatically. We examine related efforts on the modeling of large video semantic spaces and review public evaluations such as TRECVID, which are greatly facilitating research and development on video retrieval. We discuss the role of MPEG-7 as a way to store metadata generated for video in a fully standards-based searchable representation. Overall, we show how these approaches together go a long way to truly unleash video search. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Recent, Current and Future Developments in Video Coding

    Publication Year: 2008 , Page(s): 3
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (123 KB) |  | HTML iconHTML  

    Abstract form only given. Most recent attention in development of video coding algorithms has been devoted to the ITU-T Rec.H.264 | ISO/IEC 14496-10 advanced video coding standard. Recent and current extensions to this standard include developments for professional applications, highly efficient scalable video coding and multi-view video coding. Finally, digital video over various networks, going for higher and higher resolutions, is becoming reality. While this technology is progressing and further optimizations are sought, new challenges appear at the horizon. New types of displays include 3D capabilities, requiring the generation of additional view perspectives beyond available camera positions. Cameras and displays are coming up with permanently increasing frame rates and sizes. The tremendous amount of different applications for digital video requires additional flexibility and reconfigurability of devices. And last not least, increased compression efficiency (meaning rate reduction versus processing cost) is again becoming more important with ever increasing numbers of pixels to be transmitted. The talk will focus on possible solutions to these challenges and discuss the maturity they currently have. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Comparative Study of Classification Techniques for Knowledge-Assisted Image Analysis

    Publication Year: 2008 , Page(s): 4 - 7
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (304 KB) |  | HTML iconHTML  

    In this paper, four individual approaches to region classification for knowledge-assisted semantic image analysis are presented and comparatively evaluated. All of the examined approaches realize knowledge-assisted analysis via implicit knowledge acquisition, i.e. are based on machine learning techniques such as support vector machines (SVMs), self organizing maps (SOMs), genetic algorithm (GA)and particle swarm optimization (PSO). Under all examined approaches, each image is initially segmented and suitable low-level descriptors are extracted for every resulting segment. Then, each of the aforementioned classifiers is applied to associate every region with a predefined high-level semantic concept. An appropriate evaluation framework has been employed for the comparative evaluation of the above algorithms under varying experimental conditions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Semantic Multimedia Analysis Approach Utilizing a Region Thesaurus and LSA

    Publication Year: 2008 , Page(s): 8 - 11
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (370 KB) |  | HTML iconHTML  

    This paper presents an approach on high-level feature detection within video documents, using a region thesaurus and latent semantic analysis. A video shot is represented by a single keyframe. MPEG-7 features are extracted from coarse regions of it. A clustering algorithm is applied on all extracted regions and a region thesaurus is constructed. Its use is to assist to the mapping of low- to high-level features by a model vector representation. Latent semantic analysis is then applied on the model vectors to exploit the latent relations among region types aiming to improve detection performance. The proposed approach is thoroughly examined using TRECVID 2007 development data. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploiting Temporal and Inter-concept Co-occurrence Structure to Detect High-Level Features in Broadcast Videos

    Publication Year: 2008 , Page(s): 12 - 15
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (187 KB) |  | HTML iconHTML  

    In this paper the problem of detecting high-level features from video shots is studied. In particular, we explore the possibility of taking advantage of temporal and interconcept co-occurrence patterns that the high-level features of a video sequence exhibit. Here we present two straightforward techniques for the task: N-gram models and clustering of temporal neighbourhoods. We demonstrate the usefulness of these techniques on data sets of the TRECVID high-level feature detection tasks of the years 2005-2007. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploiting Spatial Context in Image Region Labelling Using Fuzzy Constraint Reasoning

    Publication Year: 2008 , Page(s): 16 - 19
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (604 KB) |  | HTML iconHTML  

    We present an approach for integrating explicit knowledge about the spatial context of objects into image region labelling. Our approach is based on spatial prototypes that represent the typical arrangement of objects in images. We use Fuzzy Constraint Satisfaction Problems as the underlying formal model for producing a labelling that is consistent with the spatial constraints of prototypes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatically Segmenting LifeLog Data into Events

    Publication Year: 2008 , Page(s): 20 - 23
    Cited by:  Papers (11)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (341 KB) |  | HTML iconHTML  

    A personal lifelog of visual information can be very helpful as a human memory aid. The SenseCam, a passively capturing wearable camera, captures an average of 1785 images per day, which equates to over 600000 images per year. So as not to overwhelm users it is necessary to deconstruct this substantial collection of images into digestable chunks of information, i.e. into distinct events or activities. This paper improves on previous work on automatic segmentation of SenseCam images into events by up to 29.2%, primarily through the introduction of intelligent threshold selection techniques, but also through improvements in the selection of normalisation, fusion, and vector distance techniques. Here we use the most extensive dataset ever used in this domain, 271163 images collected by 5 users over a time period of one month with manually groundtruthed events. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Identifying Different Settings in a Visual Diary

    Publication Year: 2008 , Page(s): 24 - 27
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (236 KB) |  | HTML iconHTML  

    We describe an approach to identifying specific settings in large collections of photographs corresponding to a visual diary. An algorithm developed for setting detection should be capable of clustering images captured at the same real world locations (e.g. in the dining room at home, in front of the computer in the office, in the park, etc.). This requires the selection and implementation of suitable methods to identify visually similar backgrounds in images using their visual features. The goal of the work reported here is to automatically detect settings in images taken over a single week. We achieve this using scale invariant feature transform (SIFT) features and X-means clustering. In addition, we also explore how the use of location based metadata can aid this process. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Using Neighborhood Distributions of Wavelet Coefficients for On-the-Fly, Multiscale-Based Image Retrieval

    Publication Year: 2008 , Page(s): 28 - 31
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (550 KB) |  | HTML iconHTML  

    In this paper, we define a similarity measure to compare images in the context of (indexing and) retrieval. We use the Kullback-Leibler (KL) divergence to compare sparse multiscale image descriptions in a wavelet domain. The KL divergence between wavelet coefficient distributions has already been used as a similarity measure between images. The novelty here is twofold. Firstly, we consider the dependencies between the coefficients by means of distributions of mixed intra/interscale neighborhoods. Secondly, to cope with the high-dimensionality of the resulting description space, we estimate the KL divergences in the k-th nearest neighbor framework, instead of using classical fixed size kernel methods. Query-by-example experiments are presented. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Evaluation of Local Features for Face Detection and Localization

    Publication Year: 2008 , Page(s): 32 - 35
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (341 KB) |  | HTML iconHTML  

    Local features have the ability to overcome the major drawback of traditional, holistic object detection approaches, because they are inherently invariant to geometric deformation and pose; in addition scale and rotation invariance can be easily achieved as well. However, the selection of discriminative feature locations and local descriptions is a complex task that has not been generally solved. In case of face detection, features must possess the discriminative power to differentiate between facial parts and cluttered backgrounds while they have to remain person agnostic. A multitude of suggestions for selecting facial features for tracking or identification / recognition can be found in literature, most of which rely on semi-automatic or manual definition of the feature locations. In contrast, fully automatic feature selection and generic description approaches like SIFT and SURF have been shown to provide excellent performance for rigid as well as non-rigid registration and even for object class recognition. While quantitative evaluations exist that give a hint on the registration performance of the competing designs, these scenarios are not directly transferable to object detection. In this paper we provide qualitative and quantitative analysis of existing interest point detectors as well as local descriptions in the context of face detection and localization. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Interest Based Selection of User Generated Content for Rich Multimedia Services

    Publication Year: 2008 , Page(s): 36 - 40
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (489 KB) |  | HTML iconHTML  

    In view of the overwhelming popularity of user generated content, both in terms of production and consumption, new intelligent services are needed to help users finding the content they need and enhance existing services with suitably selected content. In this paper we present a set of algorithms for retrieving content, based on dynamic user profiles and learning capabilities (e.g. based on user feedback). The profile information is used in content searches as well as for assisting the user input analysis process (i.e. speech recognition). To illustrate the approach taken, a rich communication service is presented. Here, the basic service (i.e. voice/video conferencing) is enhanced by showing pictures in real time to the users based on the topic of their conversation and their specific interests. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Using MPEG-7 for Generic Audiovisual Content Automatic Summarization

    Publication Year: 2008 , Page(s): 41 - 45
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (441 KB) |  | HTML iconHTML  

    This paper proposes and evaluates a fully automatic summarization application for generic audiovisual based on MPEG-7 compliant hierarchical summary descriptions, which allows providing flexibility, low complexity, and interoperability. The novelty of this paper regards the exploitation of a three features, low-level arousal model to generate the summary metadata needed to instantiate MPEG-7 compliant summary descriptions with the advantages this brings in terms of interoperability. Moreover, a novel, solid performance evaluation methodology has been proposed and its application has been performed. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multimedia Adaptation Decisions Modelled as Non-deterministic Operations

    Publication Year: 2008 , Page(s): 46 - 49
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (370 KB) |  | HTML iconHTML  

    This paper describes how a multimedia adaptation framework can automatically decide the sequence of operations to be executed in order to adapt an MPEG-21 Digital Item to the MPEG-21 description of the usage environment in which it will be consumed. The main innovation of this work with respect to previous multimedia adaptation decision models is that in the proposed approach decisions can be made without knowing the exact behaviour of the operations that are going to be executed. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance Analysis of Scalable Video Adaptation: Generic versus Specific Approach

    Publication Year: 2008 , Page(s): 50 - 53
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (274 KB) |  | HTML iconHTML  

    This paper provides a performance analysis of adaptation approaches designed for scalable media resources. In particular, we investigate the streaming of media resources compliant to the scalable video coding (SVC) extensions of advanced video coding (AVC) within heterogeneous environments, i.e., terminals and networks with different capabilities. Therefore, we have developed a test-bed in order to analyze two different approaches for the adaptation of scalable media resources, namely a generic approach that is applicable independently of the actual scalable coding format used and a specific approach especially built for SVC. The results show that if adaptation is required the generic approach clearly outperforms the approach specifically built for SVC. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Towards Automated Robust Vision-Based Surveillance

    Publication Year: 2008 , Page(s): 54
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (591 KB)  

    Summary form only given. COST is one of the longest-running instruments funded by the EU RTD Framework Program. The members of the COST 292 action (see: http://www.cost292.org/) have a long tradition of involvement in WIAMIS, dating back to the establishment of this event. Since 2005, the action has been invited to organize a special session and this tradition was continued for WIAMIS 2008. This year, the special session focused on a subset of the broad research remit of the action. Specifically, the session targets research in the field of automated visual surveillance. This focus reflects a growing interest within COST 292 participants in this increasingly important area. The goal of this special session is to help bring together researchers working on this topic, both within COST 292 and beyond, with a view to stimulating future collaborative research in this area. The widely circulated call for papers targeted submissions on a broad range of topics covered by the above mentioned focus, including visual computation, event and activity modeling and analysis, multiple stream analysis, architectures, indexing and storage, coding and applications. All papers were peer reviewed by at least three reviewers drawn from a combination of the WIAMIS Technical Program Committee and a special Program Subcommittee set up specifically for this special session. Acceptance criteria considered overall quality as well as relevance to the topics of the special session. In the end, just 5 papers were accepted for oral presentation. The accepted papers address scene modeling, event detection, object recognition (human and non-human) and fusion of multiple complementary data sources. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Robust People Detection by Fusion of Evidence from Multiple Methods

    Publication Year: 2008 , Page(s): 55 - 58
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (346 KB) |  | HTML iconHTML  

    This paper describes and evaluates an algorithm for real-time people detection in video sequences based on the fusion of evidence provided by three simple independent people detectors. Experiments with real video sequences show that the proposed integration-based approach is effective, robust and fast by combining simple algorithms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Autonomous and Adaptive Learning of Shadows for Surveillance

    Publication Year: 2008 , Page(s): 59 - 62
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (433 KB) |  | HTML iconHTML  

    Object detection is a critical step in automating the monitoring and surveillance tasks. To maximize its reliability, robust algorithms are needed to separate real objects from moving shadows. In this paper we propose a framework for detecting moving shadows caused by moving objects in video, which first learns autonomously and on-line the characteristic features of typical shadow pixels at various parts of the observed scene. The collected knowledge is then used to calibrate itself for the given scene, and to identify shadow pixels in subsequent frames. Experiments show that our system has a good performance, while being more adaptable and using only brightness information. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.