Notification:
We are currently experiencing intermittent issues impacting performance. We apologize for the inconvenience.
By Topic

Image Analysis for Multimedia Interactive Services (WIAMIS), 2010 11th International Workshop on

Date 12-14 April 2010

Filter Results

Displaying Results 1 - 25 of 49
  • On the use of audio events for improving video scene segmentation

    Publication Year: 2010 , Page(s): 1 - 4
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (221 KB) |  | HTML iconHTML  

    This work deals with the problem of automatic temporal segmentation of a video into elementary semantic units known as scenes. Its novelty lies in the use of high-level audio information in the form of audio events for the improvement of scene segmentation performance. More specifically, the proposed technique is built upon a recently proposed audio-visual scene segmentation approach that involves the construction of multiple scene transition graphs (STGs) that separately exploit information coming from different modalities. In the extension of the latter approach presented in this work, audio event detection results are introduced to the definition of an audio-based scene transition graph, while a visual-based scene transition graph is also defined independently. The results of these two types of STGs are subsequently combined. The application of the proposed technique to broadcast videos demonstrates the usefulness of audio events for scene segmentation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Region-based caption text extraction

    Publication Year: 2010 , Page(s): 1 - 4
    Cited by:  Papers (1)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (811 KB) |  | HTML iconHTML  

    This paper presents a method for caption text detection. The proposed method will be included in a generic indexing system dealing with other semantic concepts which are to be automatically detected as well. To have a coherent detection system, the various object detection algorithms use a common image description. In our framework, the image description is a hierarchical region-based image model. The proposed method takes advantage of texture and geometric features to detect the caption text. Texture features are estimated using wavelet analysis and mainly applied for Text candidate spotting. In turn, Text characteristics verification is basically carry out relying on geometric features, which are estimated exploiting the region-based image model. Analysis of the region hierarchy provides the final caption text objects. The final step of Consistency analysis for output is performed by a binarization algorithm that robustly estimates the thresholds on the caption text area of support. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • K-NN boosting prototype learning for object classification

    Publication Year: 2010 , Page(s): 1 - 4
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (673 KB) |  | HTML iconHTML  

    Object classification is a challenging task in computer vision. Many approaches have been proposed to extract meaningful descriptors from images and classifying them in a supervised learning framework. In this paper, we revisit the classic k-nearest neighbors (k-NN) classification rule, which has shown to be very effective when dealing with local image descriptors. However, k-NN still features some major drawbacks, mainly due to the uniform voting among the nearest prototypes in the feature space. In this paper, we propose a generalization of the classic k-NN rule in a supervised learning (boosting) framework. Namely, we redefine the voting rule as a strong classifier that linearly combines predictions from the k closest prototypes. To induce this classifier, we propose a novel learning algorithm, MLNN (Multiclass Leveraged Nearest Neighbors), which gives a simple procedure for performing prototype selection very efficiently. We tested our method on 12 categories of objects, and observed significant improvement over classic k-NN in terms of classification performances. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A new evaluation criterion for point correspondences in stereo images

    Publication Year: 2010 , Page(s): 1 - 4
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (747 KB) |  | HTML iconHTML  

    In this paper, we present a new criterion to evaluate point correspondences within a stereo setup. Many applications such as stereo matching, triangulation, lens distortion correction, and camera calibration require an evaluation criterion, indicating how well point correspondences fit to the epipolar geometry. The common criterion here is the epipolar distance. Since the epipolar geometry is often derived from noisy and partially corrupted data, an uncertainty regarding the estimation of the epipolar distance arises. However, the uncertainty of the epipolar geometry, in the shape of the covariance matrix of an epipolar line, provides additional information, and our approach utilizes this information for a new distance measure. The basic idea behind our criterion is to determine the most probable epipolar geometry that explains the point correspondence in the two views. Furthermore, we show that using Lagrange multipliers, this constrained minimization problem can be reduced to solving a set of three linear equations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Local homography estimation using keypoint descriptors

    Publication Year: 2010 , Page(s): 1 - 4
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (676 KB) |  | HTML iconHTML  

    This paper presents a new learning-based approach to estimate local homography of a given 3D object and shows that it is more accurate than specific affine region detection methods. Unlike the previous works that attempt this by adapting iterative algorithms, our method introduces a direct estimation. It performs in three step. First, a training set of features captures geometry and appearance information about keypoints taken from multiple views of the object. Then incoming keypoints are matched against the database in order to retrieve a cluster of features representing their identity. Finally the retrieved clusters are used to estimate the local pose of the patches. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Cognitive Source Coding scheme for Multiple Description 3DTV transmission

    Publication Year: 2010 , Page(s): 1 - 4
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (110 KB) |  | HTML iconHTML  

    Multiple Description Coding has recently proved to be an effective solution for the robust transmission of 3D video sequences over unreliable channels. The paper presents a novel Cognitive Source Coding scheme that improves the performance of traditional MDC schemes by combining adaptively traditional predictive and Wyner-Ziv codings according to the characteristics of the video sequence and to the channel conditions. The approach is applied to video+depth 3D transmission and permits improving the average PSNR value up to 2.5 dB with respect to traditional MDC schemes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Forensic reasoning upon pre-obtained surveillance metadata using uncertain spatio-temporal rules and subjective logic

    Publication Year: 2010 , Page(s): 1 - 4
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (875 KB) |  | HTML iconHTML  

    The majority of recent work in forensic analysis of visual surveillance content has been focusing on automatic information extraction aspects. However, little attention has been paid to the intelligent reuse of extracted (meta)data. For reasoning upon such pre-acquired metadata, in our previous paper, we proposed the use of logic programming to represent human knowledge and the use of subjective logic to handle uncertainty implied in the extracted data and the logical rules. In this paper, we further explore the proposed approach for analyzing the relationship between two persons and, more specifically, for estimating whether one person could serve as a witness of another person in a public area scene. We first develop a rule based model for the likelihood of being a good witness that uses metadata extracted by a person tracker and evaluates the relationship between the tracked persons. To cope with the uncertainty in the relationship model, we develop a reputational subjective opinion function for the spatial-temporal relations. In addition, we accumulate the acquired opinions over time using subjective logic's fusion operator. To verify our approach, we finally present a preliminary experimental case study. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • AIR: Architecture for interoperable retrieval on distributed and heterogeneous multimedia repositories

    Publication Year: 2010 , Page(s): 1 - 4
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (489 KB) |  | HTML iconHTML  

    Nowadays multimedia data is produced and consumed at an ever increasing rate. Similarly to this trend, diverse storage approaches for multimedia data have been introduced. These observations lead to the fact that distributed and heterogeneous multimedia repositories exist whereas an unified and easy access to the stored multimedia data is not given. This paper presents an architecture, named AIR, that offers the aformentioned retrieval possibilites. To ensure interoperability, AIR makes use of recently issued standards, namely the MPEG Query Format (MPQF) (multimedia query language) and the JPSearch transformation rules (metadata interoperability). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Local Invariant Feature Tracks for high-level video feature extraction

    Publication Year: 2010 , Page(s): 1 - 4
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (219 KB) |  | HTML iconHTML  

    This paper builds upon previous work on local interest point detection and description to propose the extraction and representation of novel Local Invariant Feature Tracks (LIFT). These features compactly capture not only the spatial attributes of 2D local regions, as in SIFT and related techniques, but also their long-term trajectories in time. This and other desirable properties of LIFT allow the generation of Bags-of-Spatiotemporal-Words models that facilitate capturing the dynamics of video content, which is necessary for detecting high-level video features that by definition have a strong temporal dimension. Preliminary experimental evaluation and comparison of the proposed approach reveals promising results. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An efficient prefetching strategy for remote browsing of JPEG 2000 image sequences

    Publication Year: 2010 , Page(s): 1 - 4
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (197 KB) |  | HTML iconHTML  

    This paper proposes an efficient prefetching strategy for interactive remote browsing of sequences of high resolution JPEG 2000 images. As a result of the inherent latency of client-server communication, the experiments of this study prove that a significant benefit, can be achieved, in terms of both quality and responsiveness, by anticipating certain data from the rest of the sequence while an image is being explored. This paper proposes a model based on the quality progression of the image in order to estimate which percentage of the bandwidth is dedicated to prefetch data. This solution can be easily implemented over any existing remote browsing architecture. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Comparing spatial masking modelling in just noticeable distortion controlled H.264/AVC video coding

    Publication Year: 2010 , Page(s): 1 - 4
    Cited by:  Papers (1)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (164 KB) |  | HTML iconHTML  

    This paper studies the integration of a just noticeable distortion model in a H.264/AVC standard codec to improve the final rate-distortion performance. Three masking aspects related to lossy transform coding and natural video contents are considered: frequency band decomposition, luminance component variations and pattern masking. For the latter aspect, three alternative models are considered, namely the Foley-Boynton, Foley-Boynton adaptive and Wei-Ngan models. Their performance, measured for high definition video content, and reported in terms of bitrate improvement and objective quality loss, reveals that the Foley-Boynton and its adaptive version provide the best performance with up to 35.6 % bitrate reduction at the cost of at most 1.4 % objective quality loss. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Blind estimation of the QP parameter in H.264/AVC decoded video

    Publication Year: 2010 , Page(s): 1 - 4
    Cited by:  Papers (3)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (390 KB) |  | HTML iconHTML  

    No-reference video quality monitoring algorithms rely on data collected at the receiver side. Typically, these methods assume the availability of the bitstream, so that motion vectors, coding modes and prediction residuals can be readily extracted. In this paper we show that, even without the availability of the bitstream, the decoded video sequence can be reverse engineered in order to reveal part of its coding history. Specifically, we illustrate a method for blindly estimating the quantization parameter (QP) in H.264/AVC decoded video on a frame-by-frame basis. We demonstrate by means of extensive simulations the robustness of the proposed algorithm. We discuss its usefulness in the field of video quality assessment (e.g. to perform blind PSNR estimation) and we provide an outlook on video forensics tools enabled by the proposed method (e.g. to detect temporal cropping/merging). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Coherent video reconstruction with motion estimation at the decoder

    Publication Year: 2010 , Page(s): 1 - 4
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (695 KB) |  | HTML iconHTML  

    In traditional predictive video coding the block matching is performed at the encoder. The obtained motion field is then transmitted to the decoder, together with the prediction residue. Nevertheless, if the motion field is not provided it can be reconstructed, as long as the decoder manages to exploit some correlated information. This paper presents an algorithm for the motion estimation at the decoder side, given the prediction residue only. The main novelty of this algorithm relies on the contextual reconstruction of a frame region composed of several blocks. Simulation results show that taking into account a whole row can improve significantly the results obtained with an algorithm that reconstructs each block separately. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Semi-automatic object tracking in video sequences by extension of the MRSST algorithm

    Publication Year: 2010 , Page(s): 1 - 4
    Cited by:  Papers (1)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (906 KB) |  | HTML iconHTML  

    The objective of this work is to investigate a new approach for object segmentation in videos. While some amount of user interaction is still necessary for most algorithms in this field, these can be reducedmaking use of certain properties of graph-based image segmentation algorithms. Based on one of these algorithms a framework is proposed, that tracks individual foreground objects through arbitrary video sequences and partly automates the necessary corrections required from the user. Experimental results suggest, that the proposed algorithm performs well on both low- and high-resolution video sequences and can even cope with motion blur. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A multi-resolution particle filter tracking with a dual consistency check for model update in a multi-camera environment

    Publication Year: 2010 , Page(s): 1 - 4
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (830 KB) |  | HTML iconHTML  

    This paper presents a novel tracking method with a multi-resolution technique and a dual consistency check for model update to track a non-rigid target in an uncalibrated static multi-camera environment. It is based on particle filter methods using color appearance model. Compared to our previous work, the performance of tracking system is improved by proposing: i) a dual consistency check by Kolmo-grov-Smirnov test to evaluate the consistency of target estimate and ii) an interaction of cameras step by weighted least-squares method to compute the adaptive camera transformation matrix which is used to relocate the estimate in one camera by those in other cameras when tracking failure happens. After being tested in our multi-camera environment of one person tracking, a low failure rate in addition to a better tracking precision is achieved compared to mono-camera tracking. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Activity detection using regular expressions

    Publication Year: 2010 , Page(s): 1 - 4
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (545 KB) |  | HTML iconHTML  

    In this paper we propose a new method for trajectory analysis in surveillance scenarios using Context-Free Grammars. Starting from a predefined set of activities, we provide a tool to compare the incoming paths with the stored templates, analyzing the sequence of samples at a syntactic level. Using this approach it is possible to perform the matching of trajectories at different abstraction layers, retrieving for example recurrent motion patterns or anomalous activities. The implemented system has been validated in indoor, considering as the main objective activity monitoring for assisted living applications. The results demonstrate the capability of the framework in recognizing known motion patterns, as well as in determining the presence of unknown actions, classified as anomalous. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Shape adaptive mean shift object tracking using Gaussian mixture models

    Publication Year: 2010 , Page(s): 1 - 4
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (1684 KB) |  | HTML iconHTML  

    GMM-SAMT, a new object tracking algorithm based on a combination of the mean shift principal and Gaussian mixture models is presented. GMM-SAMT uses an asymmetric shape adapted kernel, instead of a symmetrical one like in traditional mean shift tracking. During the mean shift iterations the kernel scale is altered according to the object scale, providing an initial adaption of the object shape. The final shape of the kernel is then obtained by segmenting the area inside and around the adapted kernel into object and non-object segments using Gaussian mixture models. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploring temporal aspects in user-tag co-clustering

    Publication Year: 2010 , Page(s): 1 - 4
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (623 KB) |  | HTML iconHTML  

    Tagging environments have become an interesting topic of research lately, focused mainly on clustering approaches, in order to extract emergent patterns that are derived from tag similarity and involve tag relations or user interconnections. Apart from tag similarity, an interesting parameter to be analyzed during the clustering/mining process in such data is the actual time that each tagging activity occurred. Indeed, holding a temporal dimension unfolds macroscopic and microscopic views of tagging, highlights links between objects for specific time periods and, in general, lets us observe how the users' tagging activity changes over time. In this article, we propose a time-aware user/tag clustering approach, which groups together similar users and tags that are very “active” during the same time periods. Emphasis is given on using varying time scales, so that we distinguish between clusters that are robust at many time scales and clusters that are somehow occasional, i.e. they emerge, only at a specific time period. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Gaze movement inference for implicit image annotation

    Publication Year: 2010 , Page(s): 1 - 4
    Cited by:  Papers (1)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (1022 KB) |  | HTML iconHTML  

    An innovative semi-automatic image annotation system is enriched with the feedbacks of user's eyes. This system implicitly exploits the competence of human mind and it utilizes the computational power of the computers in order to achieve a pervasive and accurate annotation. This method requires minimal user interaction. It makes it suitable to be used in distributed environments while the users are performing their usual daily surfing. The user's gaze state on the trial screen is monitored and interpreted by an interface promoted by fuzzy inference. The preliminary results indicate that in a multi-user environment the annotating precision of the system is over 80% with the Recall between 60%-80%. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 3D object duplicate detection for video retrieval

    Publication Year: 2010 , Page(s): 1 - 4
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (565 KB) |  | HTML iconHTML  

    Content-based video retrieval has become a very active research area in the last decade due to the increasing number of video content shared on social networks such as YouTube and DailyMotion. While most of the content-based video retrieval approaches employ low-level visual features for global analysis of the video, this paper proposes an object-based retrieval method as an alternative. The goal of the proposed method is to retrieve key frames and shots of a video that contain a particular object. The key idea is to apply an existing object duplicate detection method iteratively to the video sequence in order to compensate for 3D view variations, illumination changes and partial occlusions. Our approach combines viewpoint-invariant region descriptors to describe the appearance of an object using a graph model which considers the spatial layout of the individual regions. Given a query object provided by the user in the form of an image and a region of interest, the system retrieves shots containing this object by analyzing a set of key frames for each shot. The robustness of our approach is demonstrated using a video in which a 3D object is recorded from different view points and with partial occlusions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improving scalable video adaptation in a knowledge-based framework

    Publication Year: 2010 , Page(s): 1 - 4
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (311 KB) |  | HTML iconHTML  

    In a knowledge-based content adaptation framework, video adaptation can be performed in a series of steps, named conversions. The high-level decision phase in such a framework occasionally encounters several feasible parameter values of a specific conversion. This paper proposes to transfer further decisions to a low-level phase that decides which parameters maximise the quality of the adaptation. Particularly when more than one solution are available, an innovative quality measure is used for selecting the best values for the parameters among the set of values that fulfil the adaptation constraints in the case of scalable video. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Wikipedia based semantic metadata annotation of audio transcripts

    Publication Year: 2010 , Page(s): 1 - 4
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (205 KB) |  | HTML iconHTML  

    A method to automatically annotate video items with semantic metadata is presented. The method has been developed in the context of the Papyrus project to annotate documentary- like broadcast videos with a set of relevant keywords using automatic speech recognition (ASR) transcripts as a primary complementary resource. The task is complicated by the high word error rate (WER) of the ASR for this kind of videos. For this reason a novel relevance criterion based on domain information is proposed. Wikipedia is used both as a source of metadata and as a linguistic resource for disambiguating keywords and for eliminating the out of topic/out of domain keywords. Documents are annotated with relevant links to Wikipedia pages, concepts definitions, synonyms, translations and concepts categories. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • User study of the free-eye photo browsing interface

    Publication Year: 2010 , Page(s): 1 - 4
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (1521 KB) |  | HTML iconHTML  

    The striking proliferation of user-generated as well as broadcasted visual content prompted a high demand for effective content management tools and interfaces for search and browsing of visual media. This paper presents a novel intuitive interactive interface for browsing of large-scale image collections. It visualises underlying structure of the dataset by its size and spatial relations. In order to achieve this, images are initially clustered using an unsupervised graph-based clustering algorithm. By selecting images that are hierarchically laid out on the screen, user can intuitively navigate through the collection or search for specific content. The conducted experimental results based on user evaluation of photo search, browsing and selection demonstrate good usability of the presented system and improvement when compared to the standard methods for interaction with large-scale image collections. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Browsing news archives from the perspective of history: The Papyrus Browser Historiographical Issues View

    Publication Year: 2010 , Page(s): 1 - 4
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (244 KB) |  | HTML iconHTML  

    News Archives constitute an important source for historians, both for research and educational purposes. However, access to their material is not easy due to the special characteristics of the archival content as well as the possible difference between the historian's vocabulary and that of the Archive. In the context of the Papyrus EU-funded project, the requirements of historians have been investigated and taken into account for the creation of a specialized Web-based tool, the Papyrus Browser. This paper focuses on the description of the requirements that lead to the design of this tool and provides a detailed description of its main view, the Historiographical Issues View. Design and implementation issues are discussed, as well as plans for future work on the tool. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The MUG facial expression database

    Publication Year: 2010 , Page(s): 1 - 4
    Cited by:  Papers (7)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (179 KB) |  | HTML iconHTML  

    This paper presents a new extended collection of posed and induced facial expression image sequences. All sequences were captured in a controlled laboratory environment with high resolution and no occlusions. The collection consists of two parts: The first part depicts eighty six subjects performing the six basic expressions according to the “emotion prototypes” as defined in the Investigator's Guide in the FACS manual. The second part contains the same subjects recorded while they were watching an emotion inducing video. Most of the database recordings are available to the scientific community. Beyond the emotion related annotation the database contains also manual and automatic annotation of 80 facial landmark points for a significant number of frames. The database contains sufficient material for the development and the statistical evaluation of facial expression recognition systems using posed and induced expressions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.