By Topic

Pattern Analysis and Machine Intelligence, IEEE Transactions on

Issue 11 • Date Nov. 2012

Filter Results

Displaying Results 1 - 22 of 22
  • [Front cover]

    Publication Year: 2012 , Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (176 KB)  
    Freely Available from IEEE
  • [Inside front cover]

    Publication Year: 2012 , Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (202 KB)  
    Freely Available from IEEE
  • In Memoriam: Mark Everingham

    Publication Year: 2012 , Page(s): 2081 - 2082
    Save to Project icon | Request Permissions | PDF file iconPDF (61 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • A Concatenational Graph Evolution Aging Model

    Publication Year: 2012 , Page(s): 2083 - 2096
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3952 KB) |  | HTML iconHTML  

    Modeling the long-term face aging process is of great importance for face recognition and animation, but there is a lack of sufficient long-term face aging sequences for model learning. To address this problem, we propose a CONcatenational GRaph Evolution (CONGRE) aging model, which adopts decomposition strategy in both spatial and temporal aspects to learn long-term aging patterns from partially dense aging databases. In spatial aspect, we build a graphical face representation, in which a human face is decomposed into mutually interrelated subregions under anatomical guidance. In temporal aspect, the long-term evolution of the above graphical representation is then modeled by connecting sequential short-term patterns following the Markov property of aging process under smoothness constraints between neighboring short-term patterns and consistency constraints among subregions. The proposed model also considers the diversity of face aging by proposing probabilistic concatenation strategy between short-term patterns and applying scholastic sampling in aging prediction. In experiments, the aging prediction results generated by the learned aging models are evaluated both subjectively and objectively to validate the proposed model. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Minimal Solution for the Extrinsic Calibration of a Camera and a Laser-Rangefinder

    Publication Year: 2012 , Page(s): 2097 - 2107
    Cited by:  Papers (10)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1141 KB) |  | HTML iconHTML  

    This paper presents a new algorithm for the extrinsic calibration of a perspective camera and an invisible 2D laser-rangefinder (LRF). The calibration is achieved by freely moving a checkerboard pattern in order to obtain plane poses in camera coordinates and depth readings in the LRF reference frame. The problem of estimating the rigid displacement between the two sensors is formulated as one of registering a set of planes and lines in the 3D space. It is proven for the first time that the alignment of three plane-line correspondences has at most eight solutions that can be determined by solving a standard p3p problem and a linear system of equations. This leads to a minimal closed-form solution for the extrinsic calibration that can be used as hypothesis generator in a RANSAC paradigm. Our calibration approach is validated through simulation and real experiments that show the superiority with respect to the current state-of-the-art method requiring a minimum of five input planes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Model-Based Sequence Similarity with Application to Handwritten Word Spotting

    Publication Year: 2012 , Page(s): 2108 - 2120
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2597 KB) |  | HTML iconHTML  

    This paper proposes a novel similarity measure between vector sequences. We work in the framework of model-based approaches, where each sequence is first mapped to a Hidden Markov Model (HMM) and then a measure of similarity is computed between the HMMs. We propose to model sequences with semicontinuous HMMs (SC-HMMs). This is a particular type of HMM whose emission probabilities in each state are mixtures of shared Gaussians. This crucial constraint provides two major benefits. First, the a priori information contained in the common set of Gaussians leads to a more accurate estimate of the HMM parameters. Second, the computation of a similarity between two SC-HMMs can be simplified to a Dynamic Time Warping (DTW) between their mixture weight vectors, which significantly reduces the computational cost. Experiments are carried out on a handwritten word retrieval task in three different datasets-an in-house dataset of real handwritten letters, the George Washington dataset, and the IFN/ENIT dataset of Arabic handwritten words. These experiments show that the proposed similarity outperforms the traditional DTW between the original sequences, and the model-based approach which uses ordinary continuous HMMs. We also show that this increase in accuracy can be traded against a significant reduction of the computational cost. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Quantitative Evaluation of Confidence Measures for Stereo Vision

    Publication Year: 2012 , Page(s): 2121 - 2133
    Cited by:  Papers (10)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2145 KB) |  | HTML iconHTML  

    We present an extensive evaluation of 17 confidence measures for stereo matching that compares the most widely used measures as well as several novel techniques proposed here. We begin by categorizing these methods according to which aspects of stereo cost estimation they take into account and then assess their strengths and weaknesses. The evaluation is conducted using a winner-take-all framework on binocular and multibaseline datasets with ground truth. It measures the capability of each confidence method to rank depth estimates according to their likelihood for being correct, to detect occluded pixels, and to generate low-error depth maps by selecting among multiple hypotheses for each pixel. Our work was motivated by the observation that such an evaluation is missing from the rapidly maturing stereo literature and that our findings would be helpful to researchers in binocular and multiview stereo. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Embedding Retrieval of Articulated Geometry Models

    Publication Year: 2012 , Page(s): 2134 - 2146
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2303 KB) |  | HTML iconHTML  

    Due to the popularity of computer games and animation, research on 3D articulated geometry model retrieval has attracted a lot of attention in recent years. However, most existing works extract high-dimensional features to represent models and suffer from practical limitations. First, misalignment in high-dimensional features may produce unreliable euclidean distances and affect retrieval accuracy. Second, the curse of dimensionality also degrades efficiency. In this paper, we propose an embedding retrieval framework to improve the practicability of these methods. It is based on a manifold learning technique, the Diffusion Map (DM). We project all pairwise distances onto a low-dimensional space. This improves retrieval accuracy because intercluster distances are exaggerated. Then we adapt the Density-Weighted Nyström extension and further propose a novel step to locally align the Nyström embedding to the eigensolver embedding so as to reduce extension error and preserve retrieval accuracy. Finally, we propose a heuristic to handle disconnected manifolds by augmenting the kernel matrix with multiple similarity measures and shortcut edges, and further discuss the choice of DM parameters. We have incorporated two existing matching algorithms for testing. Our experimental results show improvement in precision at high recalls and in speed. Our work provides a robust retrieval framework for the matching of multimedia data that lie on manifolds. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Empirical Mode Decomposition Analysis for Visual Stylometry

    Publication Year: 2012 , Page(s): 2147 - 2157
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1697 KB) |  | HTML iconHTML  

    In this paper, we show how the tools of empirical mode decomposition (EMD) analysis can be applied to the problem of “visual stylometry,” generally defined as the development of quantitative tools for the measurement and comparisons of individual style in the visual arts. In particular, we introduce a new form of EMD analysis for images and show that it is possible to use its output as the basis for the construction of effective support vector machine (SVM)-based stylometric classifiers. We present the methodology and then test it on collections of two sets of digital captures of drawings: a set of authentic and well-known imitations of works attributed to the great Flemish artist Pieter Bruegel the Elder (1525-1569) and a set of works attributed to Dutch master Rembrandt van Rijn (1606-1669) and his pupils. Our positive results indicate that EMD-based methods may hold promise generally as a technique for visual stylometry. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast Recursive Computation of 3D Geometric Moments from Surface Meshes

    Publication Year: 2012 , Page(s): 2158 - 2163
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (304 KB) |  | HTML iconHTML  

    A new exact algorithm is proposed to compute the 3D geometric moments of a homogeneous shape defined by an unstructured triangulation of its surface. This algorithm relies on the analytical integration of the moments on tetrahedra defined by the surface triangles and a central point and on a set of recurrent relationships between the corresponding integrals, and achieves linear running time complexities with respect to the number of triangles in the surface mesh and with respect to the number of moments that are computed. This effectively reduces the complexity for computing moments up to order N from N6 to N3 with respect to the fastest previously proposed exact algorithm. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Human Identification Using Temporal Information Preserving Gait Template

    Publication Year: 2012 , Page(s): 2164 - 2176
    Cited by:  Papers (10)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2064 KB) |  | HTML iconHTML  

    Gait Energy Image (GEI) is an efficient template for human identification by gait. However, such a template loses temporal information in a gait sequence, which is critical to the performance of gait recognition. To address this issue, we develop a novel temporal template, named Chrono-Gait Image (CGI), in this paper. The proposed CGI template first extracts the contour in each gait frame, followed by encoding each of the gait contour images in the same gait sequence with a multichannel mapping function and compositing them to a single CGI. To make the templates robust to a complex surrounding environment, we also propose CGI-based real and synthetic temporal information preserving templates by using different gait periods and contour distortion techniques. Extensive experiments on three benchmark gait databases indicate that, compared with the recently published gait recognition approaches, our CGI-based temporal information preserving approach achieves competitive performance in gait recognition with robustness and efficiency. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Learning Image Similarity from Flickr Groups Using Fast Kernel Machines

    Publication Year: 2012 , Page(s): 2177 - 2188
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4812 KB) |  | HTML iconHTML  

    Measuring image similarity is a central topic in computer vision. In this paper, we propose to measure image similarity by learning from the online Flickr image groups. We do so by: Choosing 103 Flickr groups, building a one-versus-all multiclass classifier to classify test images into a group, taking the set of responses of the classifiers as features, calculating the distance between feature vectors to measure image similarity. Experimental results on the Corel dataset and the PASCAL VOC 2007 dataset show that our approach performs better on image matching, retrieval, and classification than using conventional visual features. To build our similarity measure, we need one-versus-all classifiers that are accurate and can be trained quickly on very large quantities of data. We adopt an SVM classifier with a histogram intersection kernel. We describe a novel fast training algorithm for this classifier: the Stochastic Intersection Kernel MAchine (SIKMA) training algorithm. This method can produce a kernel classifier that is more accurate than a linear classifier on tens of thousands of examples in minutes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Measuring the Objectness of Image Windows

    Publication Year: 2012 , Page(s): 2189 - 2202
    Cited by:  Papers (44)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3417 KB) |  | HTML iconHTML  

    We present a generic objectness measure, quantifying how likely it is for an image window to contain an object of any class. We explicitly train it to distinguish objects with a well-defined boundary in space, such as cows and telephones, from amorphous background elements, such as grass and road. The measure combines in a Bayesian framework several image cues measuring characteristics of objects, such as appearing different from their surroundings and having a closed boundary. These include an innovative cue to measure the closed boundary characteristic. In experiments on the challenging PASCAL VOC 07 dataset, we show this new cue to outperform a state-of-the-art saliency measure, and the combined objectness measure to perform better than any cue alone. We also compare to interest point operators, a HOG detector, and three recent works aiming at automatic object segmentation. Finally, we present two applications of objectness. In the first, we sample a small numberof windows according to their objectness probability and give an algorithm to employ them as location priors for modern class-specific object detectors. As we show experimentally, this greatly reduces the number of windows evaluated by the expensive class-specific model. In the second application, we use objectness as a complementary score in addition to the class-specific model, which leads to fewer false positives. As shown in several recent papers, objectness can act as a valuable focus of attention mechanism in many other applications operating on image windows, including weakly supervised learning of object categories, unsupervised pixelwise segmentation, and object tracking in video. Computing objectness is very efficient and takes only about 4 sec. per image. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Minimum-Distortion Isometric Shape Correspondence Using EM Algorithm

    Publication Year: 2012 , Page(s): 2203 - 2215
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1763 KB) |  | HTML iconHTML  

    We present a purely isometric method that establishes 3D correspondence between two (nearly) isometric shapes. Our method evenly samples high-curvature vertices from the given mesh representations, and then seeks an injective mapping from one vertex set to the other that minimizes the isometric distortion. We formulate the problem of shape correspondence as combinatorial optimization over the domain of all possible mappings, which then reduces in a probabilistic setting to a log-likelihood maximization problem that we solve via the Expectation-Maximization (EM) algorithm. The EM algorithm is initialized in the spectral domain by transforming the sampled vertices via classical Multidimensional Scaling (MDS). Minimization of the isometric distortion, and hence maximization of the log-likelihood function, is then achieved in the original 3D euclidean space, for each iteration of the EM algorithm, in two steps: by first using bipartite perfect matching, and then a greedy optimization algorithm. The optimal mapping obtained at convergence can be one-to-one or many-to-one upon choice. We demonstrate the performance of our method on various isometric (or nearly isometric) pairs of shapes for some of which the ground-truth correspondence is available. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Proximity-Based Frameworks for Generating Embeddings from Multi-Output Data

    Publication Year: 2012 , Page(s): 2216 - 2232
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2842 KB) |  | HTML iconHTML  

    This paper is about supervised and semi-supervised dimensionality reduction (DR) by generating spectral embeddings from multi-output data based on the pairwise proximity information. Two flexible and generic frameworks are proposed to achieve supervised DR (SDR) for multilabel classification. One is able to extend any existing single-label SDR to multilabel via sample duplication, referred to as MESD. The other is a multilabel design framework that tackles the SDR problem by computing weight (proximity) matrices based on simultaneous feature and label information, referred to as MOPE, as a generalization of many current techniques. A diverse set of different schemes for label-based proximity calculation, as well as a mechanism for combining label-based and feature-based weight information by considering information importance and prioritization, are proposed for MOPE. Additionally, we summarize many current spectral methods for unsupervised DR (UDR), single/multilabel SDR, and semi-supervised DR (SSDR) and express them under a common template representation as a general guide to researchers in the field. We also propose a general framework for achieving SSDR by combining existing SDR and UDR models, and also a procedure of reducing the computational cost via learning with a target set of relation features. The effectiveness of our proposed methodologies is demonstrated with experiments with document collections for multilabel text categorization from the natural language processing domain. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • RASL: Robust Alignment by Sparse and Low-Rank Decomposition for Linearly Correlated Images

    Publication Year: 2012 , Page(s): 2233 - 2246
    Cited by:  Papers (39)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1909 KB) |  | HTML iconHTML  

    This paper studies the problem of simultaneously aligning a batch of linearly correlated images despite gross corruption (such as occlusion). Our method seeks an optimal set of image domain transformations such that the matrix of transformed images can be decomposed as the sum of a sparse matrix of errors and a low-rank matrix of recovered aligned images. We reduce this extremely challenging optimization problem to a sequence of convex programs that minimize the sum of ℓ1-norm and nuclear norm of the two component matrices, which can be efficiently solved by scalable convex optimization techniques. We verify the efficacy of the proposed robust alignment algorithm with extensive experiments on both controlled and uncontrolled real data, demonstrating higher accuracy and efficiency than existing methods over a wide range of realistic misalignments and corruptions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Recognizing Gestures by Learning Local Motion Signatures of HOG Descriptors

    Publication Year: 2012 , Page(s): 2247 - 2258
    Cited by:  Papers (9)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1065 KB) |  | HTML iconHTML  

    We introduce a new gesture recognition framework based on learning local motion signatures (LMSs) of HOG descriptors introduced by [1]. Our main contribution is to propose a new probabilistic learning-classification scheme based on a reliable tracking of local features. After the generation of these LMSs computed on one individual by tracking Histograms of Oriented Gradient (HOG) [2] descriptor, we learn a codebook of video-words (i.e., clusters of LMSs) using k-means algorithm on a learning gesture video database. Then, the video-words are compacted to a code-book of codewords by the Maximization of Mutual Information (MMI) algorithm. At the final step, we compare the LMSs generated for a new gesture w.r.t. the learned code-book via the k-nearest neighbors (k-NN) algorithm and a novel voting strategy. Our main contribution is the handling of the N to N mapping between codewords and gesture labels within the proposed voting strategy. Experiments have been carried out on two public gesture databases: KTH [3] and IXMAS [4]. Results show that the proposed method outperforms recent state-of-the-art methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scalable Active Learning for Multiclass Image Classification

    Publication Year: 2012 , Page(s): 2259 - 2273
    Cited by:  Papers (11)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1948 KB) |  | HTML iconHTML  

    Machine learning techniques for computer vision applications like object recognition, scene classification, etc., require a large number of training samples for satisfactory performance. Especially when classification is to be performed over many categories, providing enough training samples for each category is infeasible. This paper describes new ideas in multiclass active learning to deal with the training bottleneck, making it easier to train large multiclass image classification systems. First, we propose a new interaction modality for training which requires only yes-no type binary feedback instead of a precise category label. The modality is especially powerful in the presence of hundreds of categories. For the proposed modality, we develop a Value-of-Information (VOI) algorithm that chooses informative queries while also considering user annotation cost. Second, we propose an active selection measure that works with many categories and is extremely fast to compute. This measure is employed to perform a fast seed search before computing VOI, resulting in an algorithm that scales linearly with dataset size. Third, we use locality sensitive hashing to provide a very fast approximation to active learning, which gives sublinear time scaling, allowing application to very large datasets. The approximation provides up to two orders of magnitude speedups with little loss in accuracy. Thorough empirical evaluation of classification accuracy, noise sensitivity, imbalanced data, and computational performance on a diverse set of image datasets demonstrates the strengths of the proposed algorithms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • SLIC Superpixels Compared to State-of-the-Art Superpixel Methods

    Publication Year: 2012 , Page(s): 2274 - 2282
    Cited by:  Papers (207)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4149 KB) |  | HTML iconHTML  

    Computer vision applications have come to rely increasingly on superpixels in recent years, but it is not always clear what constitutes a good superpixel algorithm. In an effort to understand the benefits and drawbacks of existing methods, we empirically compare five state-of-the-art superpixel algorithms for their ability to adhere to image boundaries, speed, memory efficiency, and their impact on segmentation performance. We then introduce a new superpixel algorithm, simple linear iterative clustering (SLIC), which adapts a k-means clustering approach to efficiently generate superpixels. Despite its simplicity, SLIC adheres to boundaries as well as or better than previous methods. At the same time, it is faster and more memory efficient, improves segmentation performance, and is straightforward to extend to supervoxel generation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Human Pose Co-Estimation and Applications

    Publication Year: 2012 , Page(s): 2282 - 2288
    Cited by:  Papers (4)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1104 KB) |  | HTML iconHTML  

    Most existing techniques for articulated Human Pose Estimation (HPE) consider each person independently. Here we tackle the problem in a new setting, coined Human Pose Coestimation (PCE), where multiple people are in a common, but unknown pose. The task of PCE is to estimate their poses jointly and to produce prototypes characterizing the shared pose. Since the poses of the individual people should be similar to the prototype, PCE has less freedom compared to estimating each pose independently, which simplifies the problem. We demonstrate our PCE technique on two applications. The first is estimating the pose of people performing the same activity synchronously, such as during aerobics, cheerleading, and dancing in a group. We show that PCE improves pose estimation accuracy over estimating each person independently. The second application is learning prototype poses characterizing a pose class directly from an image search engine queried by the class name (e.g., “lotus pose”). We show that PCE leads to better pose estimation in such images, and it learns meaningful prototypes which can be used as priors for pose estimation in novel images. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • [Information for authors]

    Publication Year: 2012 , Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (202 KB)  
    Freely Available from IEEE
  • [Back cover]

    Publication Year: 2012 , Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (176 KB)  
    Freely Available from IEEE

Aims & Scope

The IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) is published monthly. Its editorial board strives to present most important research results in areas within TPAMI's scope.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David A. Forsyth
University of Illinois