By Topic

Pattern Analysis and Machine Intelligence, IEEE Transactions on

Issue 7 • Date July 2012

Filter Results

Displaying Results 1 - 20 of 20
  • [Front cover]

    Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (167 KB)  
    Freely Available from IEEE
  • [Inside front cover]

    Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (201 KB)  
    Freely Available from IEEE
  • Free Energy Score Spaces: Using Generative Information in Discriminative Classifiers

    Page(s): 1249 - 1262
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (900 KB)  

    A score function induced by a generative model of the data can provide a feature vector of a fixed dimension for each data sample. Data samples themselves may be of differing lengths (e.g., speech segments or other sequential data), but as a score function is based on the properties of the data generation process, it produces a fixed-length vector in a highly informative space, typically referred to as “score space.” Discriminative classifiers have been shown to achieve higher performances in appropriately chosen score spaces with respect to what is achievable by either the corresponding generative likelihood-based classifiers or the discriminative classifiers using standard feature extractors. In this paper, we present a novel score space that exploits the free energy associated with a generative model. The resulting free energy score space (FESS) takes into account the latent structure of the data at various levels and can be shown to lead to classification performance that at least matches the performance of the free energy classifier based on the same generative model and the same factorization of the posterior. We also show that in several typical computer vision and computational biology applications the classifiers optimized in FESS outperform the corresponding pure generative approaches, as well as a number of previous approaches combining discriminating and generative models. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Tangent Bundle Theory for Visual Curve Completion

    Page(s): 1263 - 1280
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2340 KB)  

    Visual curve completion is a fundamental perceptual mechanism that completes the missing parts (e.g., due to occlusion) between observed contour fragments. Previous research into the shape of completed curves has generally followed an “axiomatic” approach, where desired perceptual/geometrical properties are first defined as axioms, followed by mathematical investigation into curves that satisfy them. However, determining psychophysically such desired properties is difficult and researchers still debate what they should be in the first place. Instead, here we exploit the observation that curve completion is an early visual process to formalize the problem in the unit tangent bundle R2 × S1, which abstracts the primary visual cortex (V1) and facilitates exploration of basic principles from which perceptual properties are later derived rather than imposed. Exploring here the elementary principle of least action in V1, we show how the problem becomes one of finding minimum-length admissible curves in R2 × S1. We formalize the problem in variational terms, we analyze it theoretically, and we formulate practical algorithms for the reconstruction of these completed curves. We then explore their induced visual properties vis-à-vis popular perceptual axioms and show how our theory predicts many perceptual properties reported in the corresponding perceptual literature. Finally, we demonstrate a variety of curve completions and report comparisons to psychophysical data and other completion models. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • BRIEF: Computing a Local Binary Descriptor Very Fast

    Page(s): 1281 - 1298
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2616 KB)  

    Binary descriptors are becoming increasingly popular as a means to compare feature points very fast while requiring comparatively small amounts of memory. The typical approach to creating them is to first compute floating-point ones, using an algorithm such as SIFT, and then to binarize them. In this paper, we show that we can directly compute a binary descriptor, which we call BRIEF, on the basis of simple intensity difference tests. As a result, BRIEF is very fast both to build and to match. We compare it against SURF and SIFT on standard benchmarks and show that it yields comparable recognition accuracy, while running in an almost vanishing fraction of the time required by either. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Constrained Nonnegative Matrix Factorization for Image Representation

    Page(s): 1299 - 1311
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2042 KB)  

    Nonnegative matrix factorization (NMF) is a popular technique for finding parts-based, linear representations of nonnegative data. It has been successfully applied in a wide range of applications such as pattern recognition, information retrieval, and computer vision. However, NMF is essentially an unsupervised method and cannot make use of label information. In this paper, we propose a novel semi-supervised matrix decomposition method, called Constrained Nonnegative Matrix Factorization (CNMF), which incorporates the label information as additional constraints. Specifically, we show how explicitly combining label information improves the discriminating power of the resulting matrix decomposition. We explore the proposed CNMF method with two cost function formulations and provide the corresponding update solutions for the optimization problems. Empirical experiments demonstrate the effectiveness of our novel algorithm in comparison to the state-of-the-art approaches through a set of evaluations based on real-world applications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • CPMC: Automatic Object Segmentation Using Constrained Parametric Min-Cuts

    Page(s): 1312 - 1328
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4472 KB)  

    We present a novel framework to generate and rank plausible hypotheses for the spatial extent of objects in images using bottom-up computational processes and mid-level selection cues. The object hypotheses are represented as figure-ground segmentations, and are extracted automatically, without prior knowledge of the properties of individual object classes, by solving a sequence of Constrained Parametric Min-Cut problems (CPMC) on a regular image grid. In a subsequent step, we learn to rank the corresponding segments by training a continuous model to predict how likely they are to exhibit real-world regularities (expressed as putative overlap with ground truth) based on their mid-level region properties, then diversify the estimated overlap score using maximum marginal relevance measures. We show that this algorithm significantly outperforms the state of the art for low-level segmentation in the VOC 2009 and 2010 data sets. In our companion papers [1], [2], we show that the algorithm can be used, successfully, in a segmentation-based visual object category recognition pipeline. This architecture ranked first in the VOC2009 and VOC2010 image segmentation and labeling challenges. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Difference-Based Image Noise Modeling Using Skellam Distribution

    Page(s): 1329 - 1341
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2060 KB)  

    By the laws of quantum physics, pixel intensity does not have a true value, but should be a random variable. Contrary to the conventional assumptions, the distribution of intensity may not be an additive Gaussian. We propose to directly model the intensity difference and show its validity by an experimental comparison to the conventional additive model. As a model of the intensity difference, we present a Skellam distribution derived from the Poisson photon noise model. This modeling induces a linear relationship between intensity and Skellam parameters, while conventional variance computation methods do not yield any significant relationship between these parameters under natural illumination. The intensity-Skellam line is invariant to scene, illumination, and even most of camera parameters. We also propose practical methods to obtain the line using a color pattern and an arbitrary image under natural illumination. Because the Skellam parameters that can be obtained from this linearity determine a noise distribution for each intensity value, we can statistically determine whether any intensity difference is caused by an underlying signal difference or by noise. We demonstrate the effectiveness of this new noise model by applying it to practical applications of background subtraction and edge detection. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IntentSearch: Capturing User Intention for One-Click Internet Image Search

    Page(s): 1342 - 1353
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1479 KB)  

    Web-scale image search engines (e.g., Google image search, Bing image search) mostly rely on surrounding text features. It is difficult for them to interpret users' search intention only by query keywords and this leads to ambiguous and noisy search results which are far from satisfactory. It is important to use visual information in order to solve the ambiguity in text-based image retrieval. In this paper, we propose a novel Internet image search approach. It only requires the user to click on one query image with minimum effort and images from a pool retrieved by text-based search are reranked based on both visual and textual content. Our key contribution is to capture the users' search intention from this one-click query image in four steps. 1) The query image is categorized into one of the predefined adaptive weight categories which reflect users' search intention at a coarse level. Inside each category, a specific weight schema is used to combine visual features adaptive to this kind of image to better rerank the text-based search result. 2) Based on the visual content of the query image selected by the user and through image clustering, query keywords are expanded to capture user intention. 3) Expanded keywords are used to enlarge the image pool to contain more relevant images. 4) Expanded keywords are also used to expand the query image to multiple positive visual examples from which new query specific visual and textual similarity metrics are learned to further improve content-based image reranking. All these steps are automatic, without extra effort from the user. This is critically important for any commercial web-based image search engine, where the user interface has to be extremely simple. Besides this key contribution, a set of visual features which are both effective and efficient in Internet image search are designed. Experimental evaluation shows that our approach significantly improves the precision of top-ranked images and also the user experi- nce. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Learning Hybrid Image Templates (HIT) by Information Projection

    Page(s): 1354 - 1367
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3386 KB)  

    This paper presents a novel framework for learning a generative image representation-the hybrid image template (HIT) from a small number (i.e., 3 sim 20) of image examples. Each learned template is composed of, typically, 50 sim 500 image patches whose geometric attributes (location, scale, orientation) may adapt in a local neighborhood for deformation, and whose appearances are characterized, respectively, by four types of descriptors: local sketch (edge or bar), texture gradients with orientations, flatness regions, and colors. These heterogeneous patches are automatically ranked and selected from a large pool according to their information gains using an information projection framework. Intuitively, a patch has a higher information gain if 1) its feature statistics are consistent within the training examples and are distinctive from the statistics of negative examples (i.e., generic images or examples from other categories); and 2) its feature statistics have less intraclass variations. The learning process pursues the most informative (for either generative or discriminative purpose) patches one at a time and stops when the information gain is within statistical fluctuation. The template is associated with a well-normalized probability model that integrates the heterogeneous feature statistics. This automated feature selection procedure allows our algorithm to scale up to a wide range of image categories, from those with regular shapes to those with stochastic texture. The learned representation captures the intrinsic characteristics of the object or scene categories. We evaluate the hybrid image templates on several public benchmarks, and demonstrate classification performances on par with state-of-the-art methods like HoG+SVM, and when small training sample sizes are used, the proposed system shows a clear advantage. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Maximum Likelihood Estimation of Depth Maps Using Photometric Stereo

    Page(s): 1368 - 1380
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1008 KB)  

    Photometric stereo and depth-map estimation provide a way to construct a depth map from images of an object under one viewpoint but with varying illumination directions. While estimating surface normals using the Lambertian model of reflectance is well established, depth-map estimation is an ongoing field of research and dealing with image noise is an active topic. Using the zero-mean Gaussian model of image noise, this paper introduces a method for maximum likelihood depth-map estimation that accounts for the propagation of noise through all steps of the estimation process. Solving for maximum likelihood depth-map estimates involves an independent sequence of nonlinear regression estimates, one for each pixel, followed by a single large and sparse linear regression estimate. The linear system employs anisotropic weights, which arise naturally and differ in value to related work. The new depth-map estimation method remains efficient and fast, making it practical for realistic image sizes. Experiments using synthetic images demonstrate the method's ability to robustly estimate depth maps under the noise model. Practical benefits of the method on challenging imaging scenarios are illustrated by experiments using the Extended Yale Face Database B and an extensive data set of 500 reflected light microscopy image sequences. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Polynomial Eigenvalue Solutions to Minimal Problems in Computer Vision

    Page(s): 1381 - 1393
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (514 KB)  

    We present a method for solving systems of polynomial equations appearing in computer vision. This method is based on polynomial eigenvalue solvers and is more straightforward and easier to implement than the state-of-the-art Gröbner basis method since eigenvalue problems are well studied, easy to understand, and efficient and robust algorithms for solving these problems are available. We provide a characterization of problems that can be efficiently solved as polynomial eigenvalue problems (PEPs) and present a resultant-based method for transforming a system of polynomial equations to a polynomial eigenvalue problem. We propose techniques that can be used to reduce the size of the computed polynomial eigenvalue problems. To show the applicability of the proposed polynomial eigenvalue method, we present the polynomial eigenvalue solutions to several important minimal relative pose problems. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Toward Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models

    Page(s): 1394 - 1408
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5053 KB)  

    Scene understanding includes many related subtasks, such as scene categorization, depth estimation, object detection, etc. Each of these subtasks is often notoriously hard, and state-of-the-art classifiers already exist for many of them. These classifiers operate on the same raw image and provide correlated outputs. It is desirable to have an algorithm that can capture such correlation without requiring any changes to the inner workings of any classifier. We propose Feedback Enabled Cascaded Classification Models (FE-CCM), that jointly optimizes all the subtasks while requiring only a “black box” interface to the original classifier for each subtask. We use a two-layer cascade of classifiers, which are repeated instantiations of the original ones, with the output of the first layer fed into the second layer as input. Our training method involves a feedback step that allows later classifiers to provide earlier classifiers information about which error modes to focus on. We show that our method significantly improves performance in all the subtasks in the domain of scene understanding, where we consider depth estimation, scene categorization, event categorization, object detection, geometric labeling, and saliency detection. Our method also improves performance in two robotic applications: an object-grasping robot and an object-finding robot. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Tracking-Learning-Detection

    Page(s): 1409 - 1422
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2731 KB)  

    This paper investigates long-term tracking of unknown objects in a video stream. The object is defined by its location and extent in a single frame. In every frame that follows, the task is to determine the object's location and extent or indicate that the object is not present. We propose a novel tracking framework (TLD) that explicitly decomposes the long-term tracking task into tracking, learning, and detection. The tracker follows the object from frame to frame. The detector localizes all appearances that have been observed so far and corrects the tracker if necessary. The learning estimates the detector's errors and updates it to avoid these errors in the future. We study how to identify the detector's errors and learn from them. We develop a novel learning method (P-N learning) which estimates the errors by a pair of “experts”: (1) P-expert estimates missed detections, and (2) N-expert estimates false alarms. The learning process is modeled as a discrete dynamical system and the conditions under which the learning guarantees improvement are found. We describe our real-time implementation of the TLD framework and the P-N learning. We carry out an extensive quantitative evaluation which shows a significant improvement over state-of-the-art approaches. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Trainable Convolution Filters and Their Application to Face Recognition

    Page(s): 1423 - 1436
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2398 KB)  

    In this paper, we present a novel image classification system that is built around a core of trainable filter ensembles that we call Volterra kernel classifiers. Our system treats images as a collection of possibly overlapping patches and is composed of three components: (1) A scheme for a single patch classification that seeks a smooth, possibly nonlinear, functional mapping of the patches into a range space, where patches of the same class are close to one another, while patches from different classes are far apart-in the L_2 sense. This mapping is accomplished using trainable convolution filters (or Volterra kernels) where the convolution kernel can be of any shape or order. (2) Given a corpus of Volterra classifiers with various kernel orders and shapes for each patch, a boosting scheme for automatically selecting the best weighted combination of the classifiers to achieve higher per-patch classification rate. (3) A scheme for aggregating the classification information obtained for each patch via voting for the parent image classification. We demonstrate the effectiveness of the proposed technique using face recognition as an application area and provide extensive experiments on the Yale, CMU PIE, Extended Yale B, Multi-PIE, and MERL Dome benchmark face data sets. We call the Volterra kernel classifiers applied to face recognition Volterrafaces. We show that our technique, which falls into the broad class of embedding-based face image discrimination methods, consistently outperforms various state-of-the-art methods in the same category. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Closed-Form Solution to Retinex with Nonlocal Texture Constraints

    Page(s): 1437 - 1444
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2717 KB)  

    We propose a method for intrinsic image decomposition based on retinex theory and texture analysis. While most previous methods approach this problem by analyzing local gradient properties, our technique additionally identifies distant pixels with the same reflectance through texture analysis, and uses these nonlocal reflectance constraints to significantly reduce ambiguity in decomposition. We formulate the decomposition problem as the minimization of a quadratic function which incorporates both the retinex constraint and our nonlocal texture constraint. This optimization can be solved in closed form with the standard conjugate gradient algorithm. Extensive experimentation with comparisons to previous techniques validate our method in terms of both decomposition accuracy and runtime efficiency. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Robust O(n) Solution to the Perspective-n-Point Problem

    Page(s): 1444 - 1450
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1934 KB)  

    We propose a noniterative solution for the Perspective-n-Point ({rm P}n{rm P}) problem, which can robustly retrieve the optimum by solving a seventh order polynomial. The central idea consists of three steps: 1) to divide the reference points into 3-point subsets in order to achieve a series of fourth order polynomials, 2) to compute the sum of the square of the polynomials so as to form a cost function, and 3) to find the roots of the derivative of the cost function in order to determine the optimum. The advantages of the proposed method are as follows: First, it can stably deal with the planar case, ordinary 3D case, and quasi-singular case, and it is as accurate as the state-of-the-art iterative algorithms with much less computational time. Second, it is the first noniterative {rm P}n{rm P} solution that can achieve more accurate results than the iterative algorithms when no redundant reference points can be used (nle 5). Third, large-size point sets can be handled efficiently because its computational complexity is O(n). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Extended Path Following Algorithm for Graph-Matching Problem

    Page(s): 1451 - 1456
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1617 KB)  

    The path following algorithm was proposed recently to approximately solve the matching problems on undirected graph models and exhibited a state-of-the-art performance on matching accuracy. In this paper, we extend the path following algorithm to the matching problems on directed graph models by proposing a concave relaxation for the problem. Based on the concave and convex relaxations, a series of objective functions are constructed, and the Frank-Wolfe algorithm is then utilized to minimize them. Several experiments on synthetic and real data witness the validity of the extended path following algorithm. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • [Inside back cover]

    Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (201 KB)  
    Freely Available from IEEE
  • [Back cover]

    Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (167 KB)  
    Freely Available from IEEE

Aims & Scope

The IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) is published monthly. Its editorial board strives to present most important research results in areas within TPAMI's scope.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David A. Forsyth
University of Illinois