Scheduled System Maintenance:
On May 6th, single article purchases and IEEE account management will be unavailable from 8:00 AM - 12:00 PM ET (12:00 - 16:00 UTC). We apologize for the inconvenience.
By Topic

Computer Vision (ICCV), 2011 IEEE International Conference on

Date 6-13 Nov. 2011

Filter Results

Displaying Results 1 - 25 of 362
  • Table of contents

    Publication Year: 2011 , Page(s): i - xxiv
    Save to Project icon | Request Permissions | PDF file iconPDF (214 KB)  
    Freely Available from IEEE
  • Message from Program Chairs

    Publication Year: 2011 , Page(s): xxv - xxix
    Save to Project icon | Request Permissions | PDF file iconPDF (152 KB)  
    Freely Available from IEEE
  • ICCV2011 Committees

    Publication Year: 2011 , Page(s): xxx - xxxvii
    Save to Project icon | Request Permissions | PDF file iconPDF (131 KB)  
    Freely Available from IEEE
  • Corporate sponsors

    Publication Year: 2011 , Page(s): xxxviii
    Save to Project icon | Request Permissions | PDF file iconPDF (151 KB)  
    Freely Available from IEEE
  • Oral session 1-1

    Publication Year: 2011 , Page(s): xxxix
    Save to Project icon | Request Permissions | PDF file iconPDF (42 KB)  
    Freely Available from IEEE
  • A graph-matching kernel for object categorization

    Publication Year: 2011 , Page(s): 1792 - 1799
    Cited by:  Papers (34)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1160 KB) |  | HTML iconHTML  

    This paper addresses the problem of category-level image classification. The underlying image model is a graph whose nodes correspond to a dense set of regions, and edges reflect the underlying grid structure of the image and act as springs to guarantee the geometric consistency of nearby regions during matching. A fast approximate algorithm for matching the graphs associated with two images is presented. This algorithm is used to construct a kernel appropriate for SVM-based image classification, and experiments with the Caltech 101, Caltech 256, and Scenes datasets demonstrate performance that matches or exceeds the state of the art for methods using a single type of features. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Domain adaptation for object recognition: An unsupervised approach

    Publication Year: 2011 , Page(s): 999 - 1006
    Cited by:  Papers (54)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (401 KB) |  | HTML iconHTML  

    Adapting the classifier trained on a source domain to recognize instances from a new target domain is an important problem that is receiving recent attention. In this paper, we present one of the first studies on unsupervised domain adaptation in the context of object recognition, where we have labeled data only from the source domain (and therefore do not have correspondences between object categories across domains). Motivated by incremental learning, we create intermediate representations of data between the two domains by viewing the generative subspaces (of same dimension) created from these domains as points on the Grassmann manifold, and sampling points along the geodesic between them to obtain subspaces that provide a meaningful description of the underlying domain shift. We then obtain the projections of labeled source domain data onto these subspaces, from which a discriminative classifier is learnt to classify projected data from the target domain. We discuss extensions of our approach for semi-supervised adaptation, and for cases with multiple source and target domains, and report competitive results on standard datasets. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Structured class-labels in random forests for semantic image labelling

    Publication Year: 2011 , Page(s): 2190 - 2197
    Cited by:  Papers (21)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1667 KB) |  | HTML iconHTML  

    In this paper we propose a simple and effective way to integrate structural information in random forests for semantic image labelling. By structural information we refer to the inherently available, topological distribution of object classes in a given image. Different object class labels will not be randomly distributed over an image but usually form coherently labelled regions. In this work we provide a way to incorporate this topological information in the popular random forest framework for performing low-level, unary classification. Our paper has several contributions: First, we show how random forests can be augmented with structured label information. In the second part, we introduce a novel data splitting function that exploits the joint distributions observed in the structured label space for learning typical label transitions between object classes. Finally, we provide two possibilities for integrating the structured output predictions into concise, semantic labellings. In our experiments on the challenging MSRC and CamVid databases, we compare our method to standard random forest and conditional random field classification results. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Oral session 1-2 [breaker-page]

    Publication Year: 2011 , Page(s): xl
    Save to Project icon | Request Permissions | PDF file iconPDF (42 KB)  
    Freely Available from IEEE
  • Perturb-and-MAP random fields: Using discrete optimization to learn and sample from energy models

    Publication Year: 2011 , Page(s): 193 - 200
    Cited by:  Papers (6)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1767 KB) |  | HTML iconHTML  

    We propose a novel way to induce a random field from an energy function on discrete labels. It amounts to locally injecting noise to the energy potentials, followed by finding the global minimum of the perturbed energy function. The resulting Perturb-and-MAP random fields harness the power of modern discrete energy minimization algorithms, effectively transforming them into efficient random sampling algorithms, thus extending their scope beyond the usual deterministic setting. In this fashion we can enjoy the benefits of a sound probabilistic framework, such as the ability to represent the solution uncertainty or learn model parameters from training data, while completely bypassing costly Markov-chain Monte-Carlo procedures typically associated with discrete label Gibbs Markov random fields (MRFs). We study some interesting theoretical properties of the proposed model in juxtaposition to those of Gibbs MRFs and address the issue of principled design of the perturbation process. We present experimental results in image segmentation and scene labeling that illustrate the new qualitative aspects and the potential of the proposed model for practical computer vision applications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Discriminative learning of relaxed hierarchy for large-scale visual recognition

    Publication Year: 2011 , Page(s): 2072 - 2079
    Cited by:  Papers (15)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2120 KB) |  | HTML iconHTML  

    In the real visual world, the number of categories a classifier needs to discriminate is on the order of hundreds or thousands. For example, the SUN dataset [24] contains 899 scene categories and ImageNet [6] has 15,589 synsets. Designing a multiclass classifier that is both accurate and fast at test time is an extremely important problem in both machine learning and computer vision communities. To achieve a good trade-off between accuracy and speed, we adopt the relaxed hierarchy structure from [15], where a set of binary classifiers are organized in a tree or DAG (directed acyclic graph) structure. At each node, classes are colored into positive and negative groups which are separated by a binary classifier while a subset of confusing classes is ignored. We color the classes and learn the induced binary classifier simultaneously using a unified and principled max-margin optimization. We provide an analysis on generalization error to justify our design. Our method has been tested on both Caltech-256 (object recognition) [9] and the SUN dataset (scene classification) [24], and shows significant improvement over existing methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Decision tree fields

    Publication Year: 2011 , Page(s): 1668 - 1675
    Cited by:  Papers (18)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (727 KB) |  | HTML iconHTML  

    This paper introduces a new formulation for discrete image labeling tasks, the Decision Tree Field (DTF), that combines and generalizes random forests and conditional random fields (CRF) which have been widely used in computer vision. In a typical CRF model the unary potentials are derived from sophisticated random forest or boosting based classifiers, however, the pairwise potentials are assumed to (1) have a simple parametric form with a pre-specified and fixed dependence on the image data, and (2) to be defined on the basis of a small and fixed neighborhood. In contrast, in DTF, local interactions between multiple variables are determined by means of decision trees evaluated on the image data, allowing the interactions to be adapted to the image content. This results in powerful graphical models which are able to represent complex label structure. Our key technical contribution is to show that the DTF model can be trained efficiently and jointly using a convex approximate likelihood function, enabling us to learn over a million free model parameters. We show experimentally that for applications which have a rich and complex label structure, our model achieves excellent results. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Oral session 1-3 [breaker-page]

    Publication Year: 2011 , Page(s): xli
    Save to Project icon | Request Permissions | PDF file iconPDF (42 KB)  
    Freely Available from IEEE
  • Strong supervision from weak annotation: Interactive training of deformable part models

    Publication Year: 2011 , Page(s): 1832 - 1839
    Cited by:  Papers (18)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2097 KB) |  | HTML iconHTML  

    We propose a framework for large scale learning and annotation of structured models. The system interleaves interactive labeling (where the current model is used to semi-automate the labeling of a new example) and online learning (where a newly labeled example is used to update the current model parameters). This framework is scalable to large datasets and complex image models and is shown to have excellent theoretical and practical properties in terms of train time, optimality guarantees, and bounds on the amount of annotation effort per image. We apply this framework to part-based detection, and introduce a novel algorithm for interactive labeling of deformable part models. The labeling tool updates and displays in real-time the maximum likelihood location of all parts as the user clicks and drags the location of one or more parts. We demonstrate that the system can be used to efficiently and robustly train part and pose detectors on the CUB Birds-200-a challenging dataset of birds in unconstrained pose and environment. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Birdlets: Subordinate categorization using volumetric primitives and pose-normalized appearance

    Publication Year: 2011 , Page(s): 161 - 168
    Cited by:  Papers (31)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (7635 KB) |  | HTML iconHTML  

    Subordinate-level categorization typically rests on establishing salient distinctions between part-level characteristics of objects, in contrast to basic-level categorization, where the presence or absence of parts is determinative. We develop an approach for subordinate categorization in vision, focusing on an avian domain due to the fine-grained structure of the category taxonomy for this domain. We explore a pose-normalized appearance model based on a volumetric poselet scheme. The variation in shape and appearance properties of these parts across a taxonomy provides the cues needed for subordinate categorization. Training pose detectors requires a relatively large amount of training data per category when done from scratch; using a subordinate-level approach, we exploit a pose classifier trained at the basic-level, and extract part appearance and shape information to build subordinate-level models. Our model associates the underlying image pattern parameters used for detection with corresponding volumetric part location, scale and orientation parameters. These parameters implicitly define a mapping from the image pixels into a pose-normalized appearance space, removing view and pose dependencies, facilitating fine-grained categorization from relatively few training examples. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • From contours to 3D object detection and pose estimation

    Publication Year: 2011 , Page(s): 983 - 990
    Cited by:  Papers (15)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (9606 KB) |  | HTML iconHTML  

    This paper addresses view-invariant object detection and pose estimation from a single image. While recent work focuses on object-centered representations of point-based object features, we revisit the viewer-centered framework, and use image contours as basic features. Given training examples of arbitrary views of an object, we learn a sparse object model in terms of a few view-dependent shape templates. The shape templates are jointly used for detecting object occurrences and estimating their 3D poses in a new image. Instrumental to this is our new mid-level feature, called bag of boundaries (BOB), aimed at lifting from individual edges toward their more informative summaries for identifying object boundaries amidst the background clutter. In inference, BOBs are placed on deformable grids both in the image and the shape templates, and then matched. This is formulated as a convex optimization problem that accommodates invariance to non-rigid, locally affine shape deformations. Evaluation on benchmark datasets demonstrates our competitive results relative to the state of the art. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Oral session 2-1

    Publication Year: 2011 , Page(s): xlii
    Save to Project icon | Request Permissions | PDF file iconPDF (42 KB)  
    Freely Available from IEEE
  • What an image reveals about material reflectance

    Publication Year: 2011 , Page(s): 1076 - 1083
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4169 KB) |  | HTML iconHTML  

    We derive precise conditions under which material reflectance properties may be estimated from a single image of a homogeneous curved surface (canonically a sphere), lit by a directional source. Based on the observation that light is reflected along certain (a priori unknown) preferred directions such as the half-angle, we propose a semiparametric BRDF abstraction that lies between purely parametric and purely data-driven models. Formulating BRDF estimation as a particular type of semiparametric regression, both the preferred directions and the form of BRDF variation along them can be estimated from data. Our approach has significant theoretical, algorithmic and empirical benefits, lends insights into material behavior and enables novel applications. While it is well-known that fitting multi-lobe BRDFs may be ill-posed under certain conditions, prior to this work, precise results for the well-posedness of BRDF estimation had remained elusive. Since our BRDF representation is derived from physical intuition, but relies on data, we avoid pitfalls of both parametric (low generalizability) and non-parametric regression (low interpretability, curse of dimensionality). Finally, we discuss several applications such as single-image relighting, light source estimation and physically meaningful BRDF editing. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multiplexed illumination for scene recovery in the presence of global illumination

    Publication Year: 2011 , Page(s): 691 - 698
    Cited by:  Papers (9)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4211 KB) |  | HTML iconHTML  

    Global illumination effects such as inter-reflections and subsurface scattering result in systematic, and often significant errors in scene recovery using active illumination. Recently, it was shown that the direct and global components could be separated efficiently for a scene illuminated with a single light source. In this paper, we study the problem of direct-global separation for multiple light sources. We derive a theoretical lower bound for the number of required images, and propose a multiplexed illumination scheme which achieves this lower bound. We analyze the signal-to-noise ratio (SNR) characteristics of the proposed illumination multiplexing method in the context of direct-global separation. We apply our method to several scene recovery techniques requiring multiple light sources, including shape from shading, structured light 3D scanning, photometric stereo, and reflectance estimation. Both simulation and experimental results show that the proposed method can accurately recover scene information with fewer images compared to sequentially separating direct-global components for each light source. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Oral session 2-2

    Publication Year: 2011 , Page(s): xliii
    Save to Project icon | Request Permissions | PDF file iconPDF (42 KB)  
    Freely Available from IEEE
  • Data-driven crowd analysis in videos

    Publication Year: 2011 , Page(s): 1235 - 1242
    Cited by:  Papers (24)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (6375 KB) |  | HTML iconHTML  

    In this work we present a new crowd analysis algorithm powered by behavior priors that are learned on a large database of crowd videos gathered from the Internet. The algorithm works by first learning a set of crowd behavior priors off-line. During testing, crowd patches are matched to the database and behavior priors are transferred. We adhere to the insight that despite the fact that the entire space of possible crowd behaviors is infinite, the space of distinguishable crowd motion patterns may not be all that large. For many individuals in a crowd, we are able to find analogous crowd patches in our database which contain similar patterns of behavior that can effectively act as priors to constrain the difficult task of tracking an individual in a crowd. Our algorithm is data-driven and, unlike some crowd characterization methods, does not require us to have seen the test video beforehand. It performs like state-of-the-art methods for tracking people having common crowd behaviors and outperforms the methods when the tracked individual behaves in an unusual way. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A “string of feature graphs” model for recognition of complex activities in natural videos

    Publication Year: 2011 , Page(s): 2595 - 2602
    Cited by:  Papers (15)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1598 KB) |  | HTML iconHTML  

    Videos usually consist of activities involving interactions between multiple actors, sometimes referred to as complex activities. Recognition of such activities requires modeling the spatio-temporal relationships between the actors and their individual variabilities. In this paper, we consider the problem of recognition of complex activities in a video given a query example. We propose a new feature model based on a string representation of the video which respects the spatio-temporal ordering. This ordered arrangement of local collections of features (e.g., cuboids, STIP), which are the characters in the string, are initially matched using graph-based spectral techniques. Final recognition is obtained by matching the string representations of the query and the test videos in a dynamic programming framework which allows for variability in sampling rates and speed of activity execution. The method does not require tracking or recognition of body parts, is able to identify the region of interest in a cluttered scene, and gives reasonable performance with even a single query example. We test our approach in an example-based video retrieval framework with two publicly available complex activity datasets and provide comparisons against other methods that have studied this problem. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Learning spatiotemporal graphs of human activities

    Publication Year: 2011 , Page(s): 778 - 785
    Cited by:  Papers (40)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3338 KB) |  | HTML iconHTML  

    Complex human activities occurring in videos can be defined in terms of temporal configurations of primitive actions. Prior work typically hand-picks the primitives, their total number, and temporal relations (e.g., allow only followed-by), and then only estimates their relative significance for activity recognition. We advance prior work by learning what activity parts and their spatiotemporal relations should be captured to represent the activity, and how relevant they are for enabling efficient inference in realistic videos. We represent videos by spatiotemporal graphs, where nodes correspond to multiscale video segments, and edges capture their hierarchical, temporal, and spatial relationships. Access to video segments is provided by our new, multiscale segmenter. Given a set of training spatiotemporal graphs, we learn their archetype graph, and pdf's associated with model nodes and edges. The model adaptively learns from data relevant video segments and their relations, addressing the “what” and “how.” Inference and learning are formulated within the same framework - that of a robust, least-squares optimization - which is invariant to arbitrary permutations of nodes in spatiotemporal graphs. The model is used for parsing new videos in terms of detecting and localizing relevant activity parts. We out-perform the state of the art on benchmark Olympic and UT human-interaction datasets, under a favorable complexity-vs.-accuracy trade-off. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Human action recognition by learning bases of action attributes and parts

    Publication Year: 2011 , Page(s): 1331 - 1338
    Cited by:  Papers (27)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (537 KB) |  | HTML iconHTML  

    In this work, we propose to use attributes and parts for recognizing human actions in still images. We define action attributes as the verbs that describe the properties of human actions, while the parts of actions are objects and poselets that are closely related to the actions. We jointly model the attributes and parts by learning a set of sparse bases that are shown to carry much semantic meaning. Then, the attributes and parts of an action image can be reconstructed from sparse coefficients with respect to the learned bases. This dual sparsity provides theoretical guarantee of our bases learning and feature reconstruction approach. On the PASCAL action dataset and a new “Stanford 40 Actions” dataset, we show that our method extracts meaningful high-order interactions between attributes and parts in human actions while achieving state-of-the-art classification performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Oral session 2-3

    Publication Year: 2011 , Page(s): xliv
    Save to Project icon | Request Permissions | PDF file iconPDF (42 KB)  
    Freely Available from IEEE