By Topic

Computer Vision, 2009 IEEE 12th International Conference on

Date Sept. 29 2009-Oct. 2 2009

Filter Results

Displaying Results 1 - 25 of 331
  • Table of contents

    Page(s): i - xvii
    Save to Project icon | Request Permissions | PDF file iconPDF (1106 KB)  
    Freely Available from IEEE
  • Preface

    Page(s): xviii
    Save to Project icon | Request Permissions | PDF file iconPDF (566 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • Message from the Program Chairs

    Page(s): xix - xx
    Save to Project icon | Request Permissions | PDF file iconPDF (571 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • Organizing Committee

    Page(s): xxi - xxix
    Save to Project icon | Request Permissions | PDF file iconPDF (873 KB)  
    Freely Available from IEEE
  • Corporate sponsors

    Page(s): xxx
    Save to Project icon | Request Permissions | PDF file iconPDF (7245 KB)  
    Freely Available from IEEE
  • Oral session 1: Segmentation I

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (17 KB)  
    Freely Available from IEEE
  • Decomposing a scene into geometric and semantically consistent regions

    Page(s): 1 - 8
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3076 KB) |  | HTML iconHTML  

    High-level, or holistic, scene understanding involves reasoning about objects, regions, and the 3D relationships between them. This requires a representation above the level of pixels that can be endowed with high-level attributes such as class of object/region, its orientation, and (rough 3D) location within the scene. Towards this goal, we propose a region-based model which combines appearance and scene geometry to automatically decompose a scene into semantically meaningful regions. Our model is defined in terms of a unified energy function over scene appearance and structure. We show how this energy function can be learned from data and present an efficient inference technique that makes use of multiple over-segmentations of the image to propose moves in the energy-space. We show, experimentally, that our method achieves state-of-the-art performance on the tasks of both multi-class image segmentation and geometric reasoning. Finally, by understanding region classes and geometry, we show how our model can be used as the basis for 3D reconstruction of the scene. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Boundary ownership by lifting to 2.1D

    Page(s): 9 - 16
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1498 KB) |  | HTML iconHTML  

    This paper addresses the “boundary ownership” problem, also known as the figure/ground assignment problem. Estimating boundary ownerships is a key step in perceptual organization: it allows higher-level processing to be applied on non-accidental shapes corresponding to figural regions. Existing methods for estimating the boundary ownerships for a given set of boundary curves model the probability distribution function (PDF) of the binary figure/ground random variables associated with the curves. Instead of modeling this PDF directly, the proposed method uses the 2.1D model: it models the PDF of the ordinal depths of the image segments enclosed by the curves. After this PDF is maximized, the boundary ownership of a curve is determined according to the ordinal depths of the two image segments it abuts. This method has two advantages: first, boundary ownership configurations inconsistent with every depth ordering (and thus very likely to be incorrect) are eliminated from consideration; second, it allows for the integration of cues related to image segments (not necessarily adjacent) in addition to those related to the curves. The proposed method models the PDF as a conditional random field (CRF) conditioned on cues related to the curves, T-junctions, and image segments. The CRF is formulated using learnt non-parametric distributions of the cues. The method significantly improves the currently achieved figure/ground assignment accuracy, with 20.7% fewer errors in the Berkeley Segmentation Dataset. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Curvature regularity for region-based image segmentation and inpainting: A linear programming relaxation

    Page(s): 17 - 23
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (543 KB) |  | HTML iconHTML  

    We consider a class of region-based energies for image segmentation and inpainting which combine region integrals with curvature regularity of the region boundary. To minimize such energies, we formulate an integer linear program which jointly estimates regions and their boundaries. Curvature regularity is imposed by respective costs on pairs of adjacent boundary segments. By solving the associated linear programming relaxation and thresholding the solution one obtains an approximate solution to the original integer problem. To our knowledge this is the first approach to impose curvature regularity in region-based formulations in a manner that is independent of initialization and allows to compute a bound on the optimal energy. In a variety of experiments on segmentation and inpainting, we demonstrate the advantages of higher-order regularity. Moreover, we demonstrate that for most experiments the optimality gap is smaller than 2% of the global optimum. For many instances we are even able to compute the global optimum. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Oral session 2: Human detection

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (18 KB)  
    Freely Available from IEEE
  • Human detection using partial least squares analysis

    Page(s): 24 - 31
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (827 KB) |  | HTML iconHTML  

    Significant research has been devoted to detecting people in images and videos. In this paper we describe a human detection method that augments widely used edge-based features with texture and color information, providing us with a much richer descriptor set. This augmentation results in an extremely high-dimensional feature space (more than 170,000 dimensions). In such high-dimensional spaces, classical machine learning algorithms such as SVMs are nearly intractable with respect to training. Furthermore, the number of training samples is much smaller than the dimensionality of the feature space, by at least an order of magnitude. Finally, the extraction of features from a densely sampled grid structure leads to a high degree of multicollinearity. To circumvent these data characteristics, we employ Partial Least Squares (PLS) analysis, an efficient dimensionality reduction technique, one which preserves significant discriminative information, to project the data onto a much lower dimensional subspace (20 dimensions, reduced from the original 170,000). Our human detection system, employing PLS analysis over the enriched descriptor set, is shown to outperform state-of-the-art techniques on three varied datasets including the popular INRIA pedestrian dataset, the low-resolution gray-scale DaimlerChrysler pedestrian dataset, and the ETHZ pedestrian dataset consisting of full-length videos of crowded scenes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An HOG-LBP human detector with partial occlusion handling

    Page(s): 32 - 39
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1030 KB) |  | HTML iconHTML  

    By combining Histograms of Oriented Gradients (HOG) and Local Binary Pattern (LBP) as the feature set, we propose a novel human detection approach capable of handling partial occlusion. Two kinds of detectors, i.e., global detector for whole scanning windows and part detectors for local regions, are learned from the training data using linear SVM. For each ambiguous scanning window, we construct an occlusion likelihood map by using the response of each block of the HOG feature to the global detector. The occlusion likelihood map is then segmented by Mean-shift approach. The segmented portion of the window with a majority of negative response is inferred as an occluded region. If partial occlusion is indicated with high likelihood in a certain scanning window, part detectors are applied on the unoccluded regions to achieve the final classification on the current scanning window. With the help of the augmented HOG-LBP feature and the global-part occlusion handling method, we achieve a detection rate of 91.3% with FPPW= 10−6, 94.7% with FPPW= 10−5, and 97.9% with FPPW= 10−4 on the INRIA dataset, which, to our best knowledge, is the best human detection performance on the INRIA dataset. The global-part occlusion handling method is further validated using synthesized occlusion data constructed from the INRIA and Pascal dataset. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Max-margin additive classifiers for detection

    Page(s): 40 - 47
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (420 KB) |  | HTML iconHTML  

    We present methods for training high quality object detectors very quickly. The core contribution is a pair of fast training algorithms for piece-wise linear classifiers, which can approximate arbitrary additive models. The classifiers are trained in a max-margin framework and significantly outperform linear classifiers on a variety of vision datasets. We report experimental results quantifying training time and accuracy on image classification tasks and pedestrian detection, including detection results better than the best previous on the INRIA dataset with faster training. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Oral session 3: Learning

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (17 KB)  
    Freely Available from IEEE
  • Kernel methods for weakly supervised mean shift clustering

    Page(s): 48 - 55
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2606 KB) |  | HTML iconHTML  

    Mean shift clustering is a powerful unsupervised data analysis technique which does not require prior knowledge of the number of clusters, and does not constrain the shape of the clusters. The data association criteria is based on the underlying probability distribution of the data points which is defined in advance via the employed distance metric. In many problem domains, the initially designed distance metric fails to resolve the ambiguities in the clustering process. We present a novel semi-supervised kernel mean shift algorithm where the inherent structure of the data points is learned with a few user supplied constraints in addition to the original metric. The constraints we consider are the pairs of points that should be clustered together. The data points are implicitly mapped to a higher dimensional space induced by the kernel function where the constraints can be effectively enforced. The mode seeking is then performed on the embedded space and the approach preserves all the advantages of the original mean shift algorithm. Experiments on challenging synthetic and real data clearly demonstrate that significant improvements in clustering accuracy can be achieved by employing only a few constraints. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Finding shareable informative patterns and optimal coding matrix for multiclass boosting

    Page(s): 56 - 63
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (162 KB) |  | HTML iconHTML  

    A multiclass classification problem can be reduced to a collection of binary problems using an error-correcting coding matrix that specifies the binary partitions of the classes. The final classifier is an ensemble of base classifiers learned on binary problems and its performance is affected by two major factors: the qualities of the base classifiers and the coding matrix. Previous studies either focus on one of these factors or consider two factors separately. In this paper, we propose a new multiclass boosting algorithm called AdaBoost.SIP that considers both two factors simultaneously. In this algorithm, informative patterns, which are shareable by different classes rather than only discriminative on specific single class, are generated at first. Then the binary partition preferred by each pattern is found by performing stage-wise functional gradient descent on a margin-based cost function. Finally, base classifiers and coding matrix are optimized simultaneously by maximizing the negative gradient of such cost function. The proposed algorithm is applied to scene and event recognition and experimental results show its effectiveness in multiclass classification. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Learning with dynamic group sparsity

    Page(s): 64 - 71
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (208 KB) |  | HTML iconHTML  

    This paper investigates a new learning formulation called dynamic group sparsity. It is a natural extension of the standard sparsity concept in compressive sensing, and is motivated by the observation that in some practical sparse data the nonzero coefficients are often not random but tend to be clustered. Intuitively, better results can be achieved in these cases by reasonably utilizing both clustering and sparsity priors. Motivated by this idea, we have developed a new greedy sparse recovery algorithm, which prunes data residues in the iterative process according to both sparsity and group clustering priors rather than only sparsity as in previous methods. The proposed algorithm can recover stably sparse data with clustering trends using far fewer measurements and computations than current state-of-the-art algorithms with provable guarantees. Moreover, our algorithm can adaptively learn the dynamic group structure and the sparsity number if they are not available in the practical applications. We have applied the algorithm to sparse recovery and background subtraction in videos. Numerous experiments with improved performance over previous methods further validate our theoretical proofs and the effectiveness of the proposed algorithm. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Oral session 4: Geometry

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (17 KB)  
    Freely Available from IEEE
  • Building Rome in a day

    Page(s): 72 - 79
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2524 KB) |  | HTML iconHTML  

    We present a system that can match and reconstruct 3D scenes from extremely large collections of photographs such as those found by searching for a given city (e.g., Rome) on Internet photo sharing sites. Our system uses a collection of novel parallel distributed matching and reconstruction algorithms, designed to maximize parallelism at each stage in the pipeline and minimize serialization bottlenecks. It is designed to scale gracefully with both the size of the problem and the amount of available computation. We have experimented with a variety of alternative algorithms at each stage of the pipeline and report on which ones work best in a parallel computing environment. Our experimental results demonstrate that it is now possible to reconstruct cities consisting of 150 K images in less than a day on a cluster with 500 compute cores. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reconstructing building interiors from images

    Page(s): 80 - 87
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (7102 KB) |  | HTML iconHTML  

    This paper proposes a fully automated 3D reconstruction and visualization system for architectural scenes (interiors and exteriors). The reconstruction of indoor environments from photographs is particularly challenging due to texture-poor planar surfaces such as uniformly-painted walls. Our system first uses structure-from-motion, multi-view stereo, and a stereo algorithm specifically designed for Manhattan-world scenes (scenes consisting predominantly of piece-wise planar surfaces with dominant directions) to calibrate the cameras and to recover initial 3D geometry in the form of oriented points and depth maps. Next, the initial geometry is fused into a 3D model with a novel depth-map integration algorithm that, again, makes use of Manhattan-world assumptions and produces simplified 3D models. Finally, the system enables the exploration of reconstructed environments with an interactive, image-based 3D viewer. We demonstrate results on several challenging datasets, including a 3D reconstruction and image-based walk-through of an entire floor of a house, the first result of this kind from an automated computer vision system. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Is dual linear self-calibration artificially ambiguous?

    Page(s): 88 - 95
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1447 KB) |  | HTML iconHTML  

    This purely theoretical work investigates the problem of artificial singularities in camera self-calibration. Self-calibration allows one to upgrade a projective reconstruction to metric and has a concise and well-understood formulation based on the Dual Absolute Quadric (DAQ), a rank-3 quadric envelope satisfying (nonlinear) ‘spectral constraints’: it must be positive of rank 3. The practical scenario we consider is the one of square pixels, known principal point and varying unknown focal length, for which generic Critical Motion Sequences (CMS) have been thoroughly derived. The standard linear self-calibration algorithm uses the DAQ paradigm but ignores the spectral constraints. It thus has artificial CMSs, which have barely been studied so far. We propose an algebraic model of singularities based on the confocal quadric theory. It allows to easily derive all types of CMSs. We first review the already known generic CMSs, for which any self-calibration algorithm fails. We then describe all CMSs for the standard linear self-calibration algorithm; among those are artificial CMSs caused by the above spectral constraints being neglected. We then show how to detect CMSs. If this is the case it is actually possible to uniquely identify the correct self-calibration solution, based on a notion of signature of quadrics. The main conclusion of this paper is that a posteriori enforcing the spectral constraints in linear self-calibration is discriminant enough to resolve all artificial CMSs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Globally optimal affine epipolar geometry from apparent contours

    Page(s): 96 - 103
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1730 KB) |  | HTML iconHTML  

    We study the problem of estimating the epipolar geometry from apparent contours of smooth curved surfaces with affine camera models. Since apparent contours are viewpoint dependent, the only true image correspondences are projections of the frontier points, i.e., surface points whose tangent planes are also their epipolar planes. However, frontier points are unknown a priori and must be estimated simultaneously with epipolar geometry. Previous approaches to this problem adopt local greedy search methods which are sensitive to initialization, and may get trapped in local minima. We propose the first algorithm that guarantees global optimality for this problem. We first reformulate the problem using a separable form that allows us to search effectively in a 2D space, instead of on a 5D hypersphere in the classical formulation. Next, in a branch-and-bound algorithm we introduce a novel lower bounding function through interval matrix analysis. Experimental results on both synthetic and real scenes demonstrate that the proposed method is able to quickly obtain the optimal solution. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Oral session 5: Activity

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (17 KB)  
    Freely Available from IEEE
  • Activity recognition using the velocity histories of tracked keypoints

    Page(s): 104 - 111
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1253 KB) |  | HTML iconHTML  

    We present an activity recognition feature inspired by human psychophysical performance. This feature is based on the velocity history of tracked keypoints. We present a generative mixture model for video sequences using this feature, and show that it performs comparably to local spatio-temporal features on the KTH activity recognition dataset. In addition, we contribute a new activity recognition dataset, focusing on activities of daily living, with high resolution video sequences of complex actions. We demonstrate the superiority of our velocity history feature on high resolution video sequences of complicated activities. Further, we show how the velocity history feature can be extended, both with a more sophisticated latent velocity model, and by combining the velocity history feature with other useful information, like appearance, position, and high level semantic information. Our approach performs comparably to established and state of the art methods on the KTH dataset, and significantly outperforms all other methods on our challenging new dataset. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Quasi-periodic event analysis for social game retrieval

    Page(s): 112 - 119
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (963 KB) |  | HTML iconHTML  

    A new problem of retrieving social games from unstructured videos is proposed. Social games are characterized by repetitions (with variations) of alternating turns between two players. We define games as quasi-periodic motion patterns in video based on their repetitiveness property. We have developed an algorithm to extract such patterns from video. The patterns extracted by our method, from video clips of social games taken from YouTube, are shown to correspond to meaningful stages of the games. We demonstrate promising results in retrieving social games from unstructured, lab-recorded footage of children's play, and identifying social interactions in a dataset of approximately 3.75 hours of home movies. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.