By Topic

Pattern Analysis and Machine Intelligence, IEEE Transactions on

Issue 3 • Date March 2014

Filter Results

Displaying Results 1 - 23 of 23
  • [Table of contents]

    Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (274 KB)  
    Freely Available from IEEE
  • [Front inside cover]

    Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (283 KB)  
    Freely Available from IEEE
  • A Hierarchical Word-Merging Algorithm with Class Separability Measure

    Page(s): 417 - 435
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2671 KB) |  | HTML iconHTML  

    In image recognition with the bag-of-features model, a small-sized visual codebook is usually preferred to obtain a low-dimensional histogram representation and high computational efficiency. Such a visual codebook has to be discriminative enough to achieve excellent recognition performance. To create a compact and discriminative codebook, in this paper we propose to merge the visual words in a large-sized initial codebook by maximally preserving class separability. We first show that this results in a difficult optimization problem. To deal with this situation, we devise a suboptimal but very efficient hierarchical word-merging algorithm, which optimally merges two words at each level of the hierarchy. By exploiting the characteristics of the class separability measure and designing a novel indexing structure, the proposed algorithm can hierarchically merge 10,000 visual words down to two words in merely 90 seconds. Also, to show the properties of the proposed algorithm and reveal its advantages, we conduct detailed theoretical analysis to compare it with another hierarchical word-merging algorithm that maximally preserves mutual information, obtaining interesting findings. Experimental studies are conducted to verify the effectiveness of the proposed algorithm on multiple benchmark data sets. As shown, it can efficiently produce more compact and discriminative codebooks than the state-of-the-art hierarchical word-merging algorithms, especially when the size of the codebook is significantly reduced. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Animated Pose Templates for Modeling and Detecting Human Actions

    Page(s): 436 - 452
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4438 KB) |  | HTML iconHTML  

    This paper presents animated pose templates (APTs) for detecting short-term, long-term, and contextual actions from cluttered scenes in videos. Each pose template consists of two components: 1) a shape template with deformable parts represented in an And-node whose appearances are represented by the Histogram of Oriented Gradient (HOG) features, and 2) a motion template specifying the motion of the parts by the Histogram of Optical-Flows (HOF) features. A shape template may have more than one motion template represented by an Or-node. Therefore, each action is defined as a mixture (Or-node) of pose templates in an And-Or tree structure. While this pose template is suitable for detecting short-term action snippets in two to five frames, we extend it in two ways: 1) For long-term actions, we animate the pose templates by adding temporal constraints in a Hidden Markov Model (HMM), and 2) for contextual actions, we treat contextual objects as additional parts of the pose templates and add constraints that encode spatial correlations between parts. To train the model, we manually annotate part locations on several keyframes of each video and cluster them into pose templates using EM. This leaves the unknown parameters for our learning algorithm in two groups: 1) latent variables for the unannotated frames including pose-IDs and part locations, 2) model parameters shared by all training samples such as weights for HOG and HOF features, canonical part locations of each pose, coefficients penalizing pose-transition and part-deformation. To learn these parameters, we introduce a semi-supervised structural SVM algorithm that iterates between two steps: 1) learning (updating) model parameters using labeled data by solving a structural SVM optimization, and 2) imputing missing variables (i.e., detecting actions on unlabeled frames) with parameters learned from the previous step and progressively accepting high-score frames as newly labeled examples. This algorithm belongs to a- family of optimization methods known as the Concave-Convex Procedure (CCCP) that converge to a local optimal solution. The inference algorithm consists of two components: 1) Detecting top candidates for the pose templates, and 2) computing the sequence of pose templates. Both are done by dynamic programming or, more precisely, beam search. In experiments, we demonstrate that this method is capable of discovering salient poses of actions as well as interactions with contextual objects. We test our method on several public action data sets and a challenging outdoor contextual action data set collected by ourselves. The results show that our model achieves comparable or better performance compared to state-of-the-art methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Attribute-Based Classification for Zero-Shot Visual Object Categorization

    Page(s): 453 - 465
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3976 KB) |  | HTML iconHTML  

    We study the problem of object recognition for categories for which we have no training examples, a task also called zero--data or zero-shot learning. This situation has hardly been studied in computer vision research, even though it occurs frequently; the world contains tens of thousands of different object classes, and image collections have been formed and suitably annotated for only a few of them. To tackle the problem, we introduce attribute-based classification: Objects are identified based on a high-level description that is phrased in terms of semantic attributes, such as the object's color or shape. Because the identification of each such property transcends the specific learning task at hand, the attribute classifiers can be prelearned independently, for example, from existing image data sets unrelated to the current task. Afterward, new classes can be detected based on their attribute representation, without the need for a new training phase. In this paper, we also introduce a new data set, Animals with Attributes, of over 30,000 images of 50 animal classes, annotated with 85 semantic attributes. Extensive experiments on this and two more data sets show that attribute-based classification indeed is able to categorize images without access to any training images of the target classes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic Alignment of Genus-Zero Surfaces

    Page(s): 466 - 478
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2224 KB) |  | HTML iconHTML  

    A new algorithm is presented that provides a constructive way to conformally warp a triangular mesh of genus zero to a destination surface with minimal metric deformation, as well as a means to compute automatically a measure of the geometric difference between two surfaces of genus zero. The algorithm takes as input a pair of surfaces that are topological 2-spheres, each surface given by a distinct triangulation. The algorithm then constructs a map f between the two surfaces. First, each of the two triangular meshes is mapped to the unit sphere using a discrete conformal mapping algorithm. The two mappings are then composed with a Mobius transformation to generate the function f. The Mobius transformation is chosen by minimizing an energy that measures the distance of f from an isometry. We illustrate our approach using several “real life” data sets. We show first that the algorithm allows for accurate, automatic, and landmark-free nonrigid registration of brain surfaces. We then validate our approach by comparing shapes of proteins. We provide numerical experiments to demonstrate that the distances computed with our algorithm between low-resolution, surface-based representations of proteins are highly correlated with the corresponding distances computed between high-resolution, atomistic models for the same proteins. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast and Scalable Approximate Spectral Matching for Higher Order Graph Matching

    Page(s): 479 - 492
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4561 KB) |  | HTML iconHTML  

    This paper presents a fast and efficient computational approach to higher order spectral graph matching. Exploiting the redundancy in a tensor representing the affinity between feature points, we approximate the affinity tensor with the linear combination of Kronecker products between bases and index tensors. The bases and index tensors are highly compressed representations of the approximated affinity tensor, requiring much smaller memory than in previous methods, which store the full affinity tensor. We compute the principal eigenvector of the approximated affinity tensor using the small bases and index tensors without explicitly storing the approximated tensor. To compensate for the loss of matching accuracy by the approximation, we also adopt and incorporate a marginalization scheme that maps a higher order tensor to matrix as well as a one-to-one mapping constraint into the eigenvector computation process. The experimental results show that the proposed method is faster and requires smaller memory than the existing methods with little or no loss of accuracy. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Feature Coding in Image Classification: A Comprehensive Study

    Page(s): 493 - 506
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1260 KB) |  | HTML iconHTML  

    Image classification is a hot topic in computer vision and pattern recognition. Feature coding, as a key component of image classification, has been widely studied over the past several years, and a number of coding algorithms have been proposed. However, there is no comprehensive study concerning the connections between different coding methods, especially how they have evolved. In this paper, we first make a survey on various feature coding methods, including their motivations and mathematical representations, and then exploit their relations, based on which a taxonomy is proposed to reveal their evolution. Further, we summarize the main characteristics of current algorithms, each of which is shared by several coding strategies. Finally, we choose several representatives from different kinds of coding approaches and empirically evaluate them with respect to the size of the codebook and the number of training samples on several widely used databases (15-Scenes, Caltech-256, PASCAL VOC07, and SUN397). Experimental findings firmly justify our theoretical analysis, which is expected to benefit both practical applications and future research. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Good Practice in Large-Scale Learning for Image Classification

    Page(s): 507 - 520
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2258 KB) |  | HTML iconHTML  

    We benchmark several SVM objective functions for large-scale image classification. We consider one-versus-rest, multiclass, ranking, and weighted approximate ranking SVMs. A comparison of online and batch methods for optimizing the objectives shows that online methods perform as well as batch methods in terms of classification accuracy, but with a significant gain in training speed. Using stochastic gradient descent, we can scale the training to millions of images and thousands of classes. Our experimental evaluation shows that ranking-based algorithms do not outperform the one-versus-rest strategy when a large number of training examples are used. Furthermore, the gap in accuracy between the different algorithms shrinks as the dimension of the features increases. We also show that learning through cross-validation the optimal rebalancing of positive and negative examples can result in a significant improvement for the one-versus-rest strategy. Finally, early stopping can be used as an effective regularization strategy when training with online algorithms. Following these "good practices," we were able to improve the state of the art on a large subset of 10K classes and 9M images of ImageNet from 16.7 percent Top-1 accuracy to 19.1 percent. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval

    Page(s): 521 - 535
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4192 KB) |  | HTML iconHTML  

    The problem of cross-modal retrieval from multimedia repositories is considered. This problem addresses the design of retrieval systems that support queries across content modalities, for example, using an image to search for texts. A mathematical formulation is proposed, equating the design of cross-modal retrieval systems to that of isomorphic feature spaces for different content modalities. Two hypotheses are then investigated regarding the fundamental attributes of these spaces. The first is that low-level cross-modal correlations should be accounted for. The second is that the space should enable semantic abstraction. Three new solutions to the cross-modal retrieval problem are then derived from these hypotheses: correlation matching (CM), an unsupervised method which models cross-modal correlations, semantic matching (SM), a supervised technique that relies on semantic representation, and semantic correlation matching (SCM), which combines both. An extensive evaluation of retrieval performance is conducted to test the validity of the hypotheses. All approaches are shown successful for text retrieval in response to image queries and vice versa. It is concluded that both hypotheses hold, in a complementary form, although evidence in favor of the abstraction hypothesis is stronger than that for correlation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Online Multiple Kernel Similarity Learning for Visual Search

    Page(s): 536 - 549
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1599 KB)  

    Recent years have witnessed a number of studies on distance metric learning to improve visual similarity search in content-based image retrieval (CBIR). Despite their successes, most existing methods on distance metric learning are limited in two aspects. First, they usually assume the target proximity function follows the family of Mahalanobis distances, which limits their capacity of measuring similarity of complex patterns in real applications. Second, they often cannot effectively handle the similarity measure of multimodal data that may originate from multiple resources. To overcome these limitations, this paper investigates an online kernel similarity learning framework for learning kernel-based proximity functions which goes beyond the conventional linear distance metric learning approaches. Based on the framework, we propose a novel online multiple kernel similarity (OMKS) learning method which learns a flexible nonlinear proximity function with multiple kernels to improve visual similarity search in CBIR. We evaluate the proposed technique for CBIR on a variety of image data sets in which encouraging results show that OMKS outperforms the state-of-the-art techniques significantly. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Retrieval-Based Face Annotation by Weak Label Regularized Local Coordinate Coding

    Page(s): 550 - 563
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2009 KB)  

    Auto face annotation, which aims to detect human faces from a facial image and assign them proper human names, is a fundamental research problem and beneficial to many real-world applications. In this work, we address this problem by investigating a retrieval-based annotation scheme of mining massive web facial images that are freely available over the Internet. In particular, given a facial image, we first retrieve the top n similar instances from a large-scale web facial image database using content-based image retrieval techniques, and then use their labels for auto annotation. Such a scheme has two major challenges: 1) how to retrieve the similar facial images that truly match the query, and 2) how to exploit the noisy labels of the top similar facial images, which may be incorrect or incomplete due to the nature of web images. In this paper, we propose an effective Weak Label Regularized Local Coordinate Coding (WLRLCC) technique, which exploits the principle of local coordinate coding by learning sparse features, and employs the idea of graph-based weak label regularization to enhance the weak labels of the similar facial images. An efficient optimization algorithm is proposed to solve the WLRLCC problem. Moreover, an effective sparse reconstruction scheme is developed to perform the face annotation task. We conduct extensive empirical studies on several web facial image databases to evaluate the proposed WLRLCC algorithm from different aspects. The experimental results validate its efficacy. We share the two constructed databases "WDB" (714,454 images of 6,025 people) and "ADB" (126,070 images of 1,200 people) with the public. To further improve the efficiency and scalability, we also propose an offline approximation scheme (AWLRLCC) which generally maintains comparable results but significantly reduces the annotation time. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scene Particles: Unregularized Particle-Based Scene Flow Estimation

    Page(s): 564 - 576
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2081 KB)  

    In this paper, an algorithm is presented for estimating scene flow, which is a richer, 3D analog of optical flow. The approach operates orders of magnitude faster than alternative techniques and is well suited to further performance gains through parallelized implementation. The algorithm employs multiple hypotheses to deal with motion ambiguities, rather than the traditional smoothness constraints, removing oversmoothing errors and providing significant performance improvements on benchmark data, over the previous state of the art. The approach is flexible and capable of operating with any combination of appearance and/or depth sensors, in any setup, simultaneously estimating the structure and motion if necessary. Additionally, the algorithm propagates information over time to resolve ambiguities, rather than performing an isolated estimation at each frame, as in contemporary approaches. Approaches to smoothing the motion field without sacrificing the benefits of multiple hypotheses are explored, and a probabilistic approach to occlusion estimation is demonstrated, leading to 10 and 15 percent improved performance, respectively. Finally, a data-driven tracking approach is described, and used to estimate the 3D trajectories of hands during sign language, without the need to model complex appearance variations at each viewpoint. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Simultaneous Tensor Decomposition and Completion Using Factor Priors

    Page(s): 577 - 591
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2691 KB)  

    The success of research on matrix completion is evident in a variety of real-world applications. Tensor completion, which is a high-order extension of matrix completion, has also generated a great deal of research interest in recent years. Given a tensor with incomplete entries, existing methods use either factorization or completion schemes to recover the missing parts. However, as the number of missing entries increases, factorization schemes may overfit the model because of incorrectly predefined ranks, while completion schemes may fail to interpret the model factors. In this paper, we introduce a novel concept: complete the missing entries and simultaneously capture the underlying model structure. To this end, we propose a method called simultaneous tensor decomposition and completion (STDC) that combines a rank minimization technique with Tucker model decomposition. Moreover, as the model structure is implicitly included in the Tucker model, we use factor priors, which are usually known a priori in real-world tensor objects, to characterize the underlying joint-manifold drawn from the model factors. By exploiting this auxiliary information, our method leverages two classic schemes and accurately estimates the model factors and missing entries. We conducted experiments to empirically verify the convergence of our algorithm on synthetic data and evaluate its effectiveness on various kinds of real-world data. The results demonstrate the efficacy of the proposed method and its potential usage in tensor-based applications. It also outperforms state-of-the-art methods on multilinear model analysis and visual data completion tasks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Tensor Sparse Coding for Positive Definite Matrices

    Page(s): 592 - 605
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1668 KB) |  | HTML iconHTML  

    In recent years, there has been extensive research on sparse representation of vector-valued signals. In the matrix case, the data points are merely vectorized and treated as vectors thereafter (for example, image patches). However, this approach cannot be used for all matrices, as it may destroy the inherent structure of the data. Symmetric positive definite (SPD) matrices constitute one such class of signals, where their implicit structure of positive eigenvalues is lost upon vectorization. This paper proposes a novel sparse coding technique for positive definite matrices, which respects the structure of the Riemannian manifold and preserves the positivity of their eigenvalues, without resorting to vectorization. Synthetic and real-world computer vision experiments with region covariance descriptors demonstrate the need for and the applicability of the new sparse coding model. This work serves to bridge the gap between the sparse modeling paradigm and the space of positive definite matrices. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Variational Light Field Analysis for Disparity Estimation and Super-Resolution

    Page(s): 606 - 619
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4147 KB) |  | HTML iconHTML  

    We develop a continuous framework for the analysis of 4D light fields, and describe novel variational methods for disparity reconstruction as well as spatial and angular super-resolution. Disparity maps are estimated locally using epipolar plane image analysis without the need for expensive matching cost minimization. The method works fast and with inherent subpixel accuracy since no discretization of the disparity space is necessary. In a variational framework, we employ the disparity maps to generate super-resolved novel views of a scene, which corresponds to increasing the sampling rate of the 4D light field in spatial as well as angular direction. In contrast to previous work, we formulate the problem of view synthesis as a continuous inverse problem, which allows us to correctly take into account foreshortening effects caused by scene geometry transformations. All optimization problems are solved with state-of-the-art convex relaxation techniques. We test our algorithms on a number of real-world examples as well as our new benchmark data set for light fields, and compare results to a multiview stereo method. The proposed method is both faster as well as more accurate. Data sets and source code are provided online for additional evaluation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The Cues in "Dependent Multiple Cue Integration for Robust Tracking" Are Independent

    Page(s): 620 - 621
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (90 KB) |  | HTML iconHTML  

    A methodology for integrating multiple cues for tracking was proposed in several papers. These papers claim that, unlike other methodologies, conditional independence of the cues is not assumed. This brief communication 1) refutes this claim and 2) points out other major problems in the methodology. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Fair Comparison Should Be Based on the Same Protocol--Comments on "Trainable Convolution Filters and Their Application to Face Recognition"

    Page(s): 622 - 623
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (555 KB)  

    We comment on a paper describing an image classification approach called Volterra kernel classifier, which was called Volterrafaces when applied to face recognition. The performances were evaluated by the experiments on face recognition databases. We find that their comparisons with the state of the art of three databases were indeed based on unfair settings. The results with the settings of the standard protocol on three data sets are generated, which show that Volterrafaces achieves the state-of-the-art performance only in one database. View full abstract»

    Open Access
  • Full text access may be available. Click article title to sign in or learn about subscription options.
  • Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Open Access Publishing

    Page(s): 624
    Save to Project icon | Request Permissions | PDF file iconPDF (165 KB)  
    Freely Available from IEEE
  • [Back inside cover]

    Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (283 KB)  
    Freely Available from IEEE
  • [back cover]

    Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (274 KB)  
    Freely Available from IEEE

Aims & Scope

The IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) is published monthly. Its editorial board strives to present most important research results in areas within TPAMI's scope.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David A. Forsyth
University of Illinois