By Topic

Pattern Analysis and Machine Intelligence, IEEE Transactions on

Issue 6 • Date June 2012

Filter Results

Displaying Results 1 - 22 of 22
  • [Front cover]

    Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (168 KB)  
    Freely Available from IEEE
  • [Inside front cover]

    Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (200 KB)  
    Freely Available from IEEE
  • A Least-Squares Framework for Component Analysis

    Page(s): 1041 - 1055
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (532 KB) |  | HTML iconHTML  

    Over the last century, Component Analysis (CA) methods such as Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Canonical Correlation Analysis (CCA), Locality Preserving Projections (LPP), and Spectral Clustering (SC) have been extensively used as a feature extraction step for modeling, classification, visualization, and clustering. CA techniques are appealing because many can be formulated as eigen-problems, offering great potential for learning linear and nonlinear representations of data in closed-form. However, the eigen-formulation often conceals important analytic and computational drawbacks of CA techniques, such as solving generalized eigen-problems with rank deficient matrices (e.g., small sample size problem), lacking intuitive interpretation of normalization factors, and understanding commonalities and differences between CA methods. This paper proposes a unified least-squares framework to formulate many CA methods. We show how PCA, LDA, CCA, LPP, SC, and its kernel and regularized extensions correspond to a particular instance of least-squares weighted kernel reduced rank regression (LS--WKRRR). The LS-WKRRR formulation of CA methods has several benefits: 1) provides a clean connection between many CA techniques and an intuitive framework to understand normalization factors; 2) yields efficient numerical schemes to solve CA techniques; 3) overcomes the small sample size problem; 4) provides a framework to easily extend CA methods. We derive weighted generalizations of PCA, LDA, SC, and CCA, and several new CA techniques. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Detecting Carried Objects from Sequences of Walking Pedestrians

    Page(s): 1056 - 1067
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2457 KB) |  | HTML iconHTML  

    This paper proposes a method for detecting objects carried by pedestrians, such as backpacks and suitcases, from video sequences. In common with earlier work [14], [16] on the same problem, the method produces a representation of motion and shape (known as a temporal template) that has some immunity to noise in foreground segmentations and phase of the walking cycle. Our key novelty is for carried objects to be revealed by comparing the temporal templates against view-specific exemplars generated offline for unencumbered pedestrians. A likelihood map of protrusions, obtained from this match, is combined in a Markov random field for spatial continuity, from which we obtain a segmentation of carried objects using the MAP solution. We also compare the previously used method of periodicity analysis to distinguish carried objects from other protrusions with using prior probabilities for carried-object locations relative to the silhouette. We have reimplemented the earlier state-of-the-art method [14] and demonstrate a substantial improvement in performance for the new method on the PETS2006 data set. The carried-object detector is also tested on another outdoor data set. Although developed for a specific problem, the method could be applied to the detection of irregularities in appearance for other categories of object that move in a periodic fashion. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast Bundle Algorithm for Multiple-Instance Learning

    Page(s): 1068 - 1079
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1588 KB) |  | HTML iconHTML  

    We present a bundle algorithm for multiple-instance classification and ranking. These frameworks yield improved models on many problems possessing special structure. Multiple-instance loss functions are typically nonsmooth and nonconvex, and current algorithms convert these to smooth nonconvex optimization problems that are solved iteratively. Inspired by the latest linear-time subgradient-based methods for support vector machines, we optimize the objective directly using a nonconvex bundle method. Computational results show this method is linearly scalable, while not sacrificing generalization accuracy, permitting modeling on new and larger data sets in computational chemistry and other applications. This new implementation facilitates modeling with kernels. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Intrinsic Dimensionality Predicts the Saliency of Natural Dynamic Scenes

    Page(s): 1080 - 1091
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1805 KB) |  | HTML iconHTML  

    Since visual attention-based computer vision applications have gained popularity, ever more complex, biologically inspired models seem to be needed to predict salient locations (or interest points) in naturalistic scenes. In this paper, we explore how far one can go in predicting eye movements by using only basic signal processing, such as image representations derived from efficient coding principles, and machine learning. To this end, we gradually increase the complexity of a model from simple single-scale saliency maps computed on grayscale videos to spatiotemporal multiscale and multispectral representations. Using a large collection of eye movements on high-resolution videos, supervised learning techniques fine-tune the free parameters whose addition is inevitable with increasing complexity. The proposed model, although very simple, demonstrates significant improvement in predicting salient locations in naturalistic videos over four selected baseline models and two distinct data labeling scenarios. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Kernelized Locality-Sensitive Hashing

    Page(s): 1092 - 1104
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1833 KB) |  | HTML iconHTML  

    Fast retrieval methods are critical for many large-scale and data-driven vision applications. Recent work has explored ways to embed high-dimensional features or complex distance functions into a low-dimensional Hamming space where items can be efficiently searched. However, existing methods do not apply for high-dimensional kernelized data when the underlying feature embedding for the kernel is unknown. We show how to generalize locality-sensitive hashing to accommodate arbitrary kernel functions, making it possible to preserve the algorithm's sublinear time similarity search guarantees for a wide class of useful similarity functions. Since a number of successful image-based kernels have unknown or incomputable embeddings, this is especially valuable for image retrieval tasks. We validate our technique on several data sets, and show that it enables accurate and fast performance for several vision problems, including example-based object classification, local feature matching, and content-based retrieval. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Latent Log-Linear Models for Handwritten Digit Classification

    Page(s): 1105 - 1117
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2233 KB) |  | HTML iconHTML  

    We present latent log-linear models, an extension of log-linear models incorporating latent variables, and we propose two applications thereof: log-linear mixture models and image deformation-aware log-linear models. The resulting models are fully discriminative, can be trained efficiently, and the model complexity can be controlled. Log-linear mixture models offer additional flexibility within the log-linear modeling framework. Unlike previous approaches, the image deformation-aware model directly considers image deformations and allows for a discriminative training of the deformation parameters. Both are trained using alternating optimization. For certain variants, convergence to a stationary point is guaranteed and, in practice, even variants without this guarantee converge and find models that perform well. We tune the methods on the USPS data set and evaluate on the MNIST data set, demonstrating the generalization capabilities of our proposed models. Our models, although using significantly fewer parameters, are able to obtain competitive results with models proposed in the literature. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Monocular 3D Reconstruction of Locally Textured Surfaces

    Page(s): 1118 - 1130
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2984 KB)  

    Most recent approaches to monocular nonrigid 3D shape recovery rely on exploiting point correspondences and work best when the whole surface is well textured. The alternative is to rely on either contours or shading information, which has only been demonstrated in very restrictive settings. Here, we propose a novel approach to monocular deformable shape recovery that can operate under complex lighting and handle partially textured surfaces. At the heart of our algorithm are a learned mapping from intensity patterns to the shape of local surface patches and a principled approach to piecing together the resulting local shape estimates. We validate our approach quantitatively and qualitatively using both synthetic and real data. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Prototype-Based Domain Description for One-Class Classification

    Page(s): 1131 - 1144
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2055 KB)  

    This work introduces the Prototype-based Domain Description rule (PDD) one-class classifier. PDD is a nearest neighbor-based classifier since it accepts objects on the basis of their nearest neighbor distances in a reference set of objects, also called prototypes. For a suitable choice of the prototype set, the PDD classifier is equivalent to another nearest neighbor-based one-class classifier, namely, the NNDD classifier. Moreover, it generalizes statistical tests for outlier detection. The concept of a PDD consistent subset is introduced, which exploits only a selected subset of the training set. It is shown that computing a minimum size PDD consistent subset is, in general, not approximable within any constant factor. A logarithmic approximation factor algorithm, called the CPDD algorithm, for computing a minimum size PDD consistent subset is then introduced. In order to efficiently manage very large data sets, a variant of the basic rule, called Fast CPDD, is also presented. Experimental results show that the CPDD rule sensibly improves over the CNNDD classifier, namely the condensed variant of NNDD, in terms of size of the subset while guaranteeing a comparable classification quality, that it is competitive over other one-class classification methods and is suitable to classify large data sets. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reading between the Lines: Object Localization Using Implicit Cues from Image Tags

    Page(s): 1145 - 1158
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5646 KB) |  | HTML iconHTML  

    Current uses of tagged images typically exploit only the most explicit information: the link between the nouns named and the objects present somewhere in the image. We propose to leverage “unspoken” cues that rest within an ordered list of image tags so as to improve object localization. We define three novel implicit features from an image's tags-the relative prominence of each object as signified by its order of mention, the scale constraints implied by unnamed objects, and the loose spatial links hinted at by the proximity of names on the list. By learning a conditional density over the localization parameters (position and scale) given these cues, we show how to improve both accuracy and efficiency when detecting the tagged objects. Furthermore, we show how the localization density can be learned in a semantic space shared by the visual and tag-based features, which makes the technique applicable for detection in untagged input images. We validate our approach on the PASCAL VOC, LabelMe, and Flickr image data sets, and demonstrate its effectiveness relative to both traditional sliding windows as well as a visual context baseline. Our algorithm improves state-of-the-art methods, successfully translating insights about human viewing behavior (such as attention, perceived importance, or gaze) into enhanced object detection. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Rhythmic Brushstrokes Distinguish van Gogh from His Contemporaries: Findings via Automated Brushstroke Extraction

    Page(s): 1159 - 1176
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4394 KB) |  | HTML iconHTML  

    Art historians have long observed the highly characteristic brushstroke styles of Vincent van Gogh and have relied on discerning these styles for authenticating and dating his works. In our work, we compared van Gogh with his contemporaries by statistically analyzing a massive set of automatically extracted brushstrokes. A novel extraction method is developed by exploiting an integration of edge detection and clustering-based segmentation. Evidence substantiates that van Gogh's brushstrokes are strongly rhythmic. That is, regularly shaped brushstrokes are tightly arranged, creating a repetitive and patterned impression. We also found that the traits that distinguish van Gogh's paintings in different time periods of his development are all different from those distinguishing van Gogh from his peers. This study confirms that the combined brushwork features identified as special to van Gogh are consistently held throughout his French periods of production (1886-1890). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Simultaneously Fitting and Segmenting Multiple-Structure Data with Outliers

    Page(s): 1177 - 1192
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4038 KB) |  | HTML iconHTML  

    We propose a robust fitting framework, called Adaptive Kernel-Scale Weighted Hypotheses (AKSWH), to segment multiple-structure data even in the presence of a large number of outliers. Our framework contains a novel scale estimator called Iterative Kth Ordered Scale Estimator (IKOSE). IKOSE can accurately estimate the scale of inliers for heavily corrupted multiple-structure data and is of interest by itself since it can be used in other robust estimators. In addition to IKOSE, our framework includes several original elements based on the weighting, clustering, and fusing of hypotheses. AKSWH can provide accurate estimates of the number of model instances and the parameters and the scale of each model instance simultaneously. We demonstrate good performance in practical applications such as line fitting, circle fitting, range image segmentation, homography estimation, and two--view-based motion segmentation, using both synthetic data and real images. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Spacetime Texture Representation and Recognition Based on a Spatiotemporal Orientation Analysis

    Page(s): 1193 - 1205
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2819 KB) |  | HTML iconHTML  

    This paper is concerned with the representation and recognition of the observed dynamics (i.e., excluding purely spatial appearance cues) of spacetime texture based on a spatiotemporal orientation analysis. The term “spacetime texture” is taken to refer to patterns in visual spacetime, (x,y,t), that primarily are characterized by the aggregate dynamic properties of elements or local measurements accumulated over a region of spatiotemporal support, rather than in terms of the dynamics of individual constituents. Examples include image sequences of natural processes that exhibit stochastic dynamics (e.g., fire, water, and windblown vegetation) as well as images of simpler dynamics when analyzed in terms of aggregate region properties (e.g., uniform motion of elements in imagery, such as pedestrians and vehicular traffic). Spacetime texture representation and recognition is important as it provides an early means of capturing the structure of an ensuing image stream in a meaningful fashion. Toward such ends, a novel approach to spacetime texture representation and an associated recognition method are described based on distributions (histograms) of spacetime orientation structure. Empirical evaluation on both standard and original image data sets shows the promise of the approach, including significant improvement over alternative state-of-the-art approaches in recognizing the same pattern from different viewpoints. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Spatiotemporal Stereo and Scene Flow via Stequel Matching

    Page(s): 1206 - 1219
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3438 KB) |  | HTML iconHTML  

    This paper is concerned with the recovery of temporally coherent estimates of 3D structure and motion of a dynamic scene from a sequence of binocular stereo images. A novel approach is presented based on matching of spatiotemporal quadric elements (stequels) between views, as this primitive encapsulates both spatial and temporal image structure for 3D estimation. Match constraints are developed for bringing stequels into correspondence across binocular views. With correspondence established, temporally coherent disparity estimates are obtained without explicit motion recovery. Further, the matched stequels also will be shown to support direct recovery of scene flow estimates. Extensive algorithmic evaluation with ground truth data incorporated in both local and global correspondence paradigms shows the considerable benefit of using stequels as a matching primitive and its advantages in comparison to alternative methods of enforcing temporal coherence in disparity estimation. Additional experiments document the usefulness of stequel matching for 3D scene flow estimation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Blur-Robust Descriptor with Applications to Face Recognition

    Page(s): 1220 - 1226
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1335 KB) |  | HTML iconHTML  

    Understanding the effect of blur is an important problem in unconstrained visual analysis. We address this problem in the context of image-based recognition by a fusion of image-formation models and differential geometric tools. First, we discuss the space spanned by blurred versions of an image and then, under certain assumptions, provide a differential geometric analysis of that space. More specifically, we create a subspace resulting from convolution of an image with a complete set of orthonormal basis functions of a prespecified maximum size (that can represent an arbitrary blur kernel within that size), and show that the corresponding subspaces created from a clean image and its blurred versions are equal under the ideal case of zero noise and some assumptions on the properties of blur kernels. We then study the practical utility of this subspace representation for the problem of direct recognition of blurred faces by viewing the subspaces as points on the Grassmann manifold and present methods to perform recognition for cases where the blur is both homogenous and spatially varying. We empirically analyze the effect of noise, as well as the presence of other facial variations between the gallery and probe images, and provide comparisons with existing approaches on standard data sets. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Ensemble Manifold Regularization

    Page(s): 1227 - 1233
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1119 KB) |  | HTML iconHTML  

    We propose an automatic approximation of the intrinsic manifold for general semi-supervised learning (SSL) problems. Unfortunately, it is not trivial to define an optimization function to obtain optimal hyperparameters. Usually, cross validation is applied, but it does not necessarily scale up. Other problems derive from the suboptimality incurred by discrete grid search and the overfitting. Therefore, we develop an ensemble manifold regularization (EMR) framework to approximate the intrinsic manifold by combining several initial guesses. Algorithmically, we designed EMR carefully so it 1) learns both the composite manifold and the semi-supervised learner jointly, 2) is fully automatic for learning the intrinsic manifold hyperparameters implicitly, 3) is conditionally optimal for intrinsic manifold approximation under a mild and reasonable assumption, and 4) is scalable for a large number of candidate manifold hyperparameters, from both time and space perspectives. Furthermore, we prove the convergence property of EMR to the deterministic matrix at rate root-n. Extensive experiments over both synthetic and real data sets demonstrate the effectiveness of the proposed framework. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Medial Spheres for Shape Approximation

    Page(s): 1234 - 1240
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1648 KB) |  | HTML iconHTML  

    We study the problem of approximating a 3D solid with a union of overlapping spheres. In comparison with a state-of-the-art approach, our method offers more than an order of magnitude speedup and achieves a tighter approximation in terms of volume difference with the original solid while using fewer spheres. The spheres generated by our method are internal and tangent to the solid's boundary, which permits an exact error analysis, fast updates under local feature size preserving deformation, and conservative dilation. We show that our dilated spheres offer superior time and error performance in approximate separation distance tests than the state-of-the-art method for sphere set approximation for the class of (σ, θ)-fat solids. We envision that our sphere-based approximation will also prove useful for a range of other applications, including shape matching and shape segmentation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • VCells: Simple and Efficient Superpixels Using Edge-Weighted Centroidal Voronoi Tessellations

    Page(s): 1241 - 1247
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1936 KB)  

    VCells, the proposed Edge-Weighted Centroidal Voronoi Tessellations (EWCVTs)-based algorithm, is used to generate superpixels, i.e., an oversegmentation of an image. For a wide range of images, the new algorithm is capable of generating roughly uniform subregions and nicely preserving local image boundaries. The undersegmentation error is effectively limited in a controllable manner. Moreover, VCells is very efficient with core computational cost at O(K√nc·N) in which K, nc, and N are the number of iterations, superpixels, and pixels, respectively. Extensive qualitative discussions are provided, together with the high-quality segmentation results of VCells on a wide range of complex images. The simplicity and efficiency of our model are demonstrated by complexity analysis, time, and accuracy evaluations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Computer Society OnlinePlus Tutorial Video

    Page(s): 1248
    Save to Project icon | Request Permissions | PDF file iconPDF (773 KB)  
    Freely Available from IEEE
  • [Inside back cover]

    Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (201 KB)  
    Freely Available from IEEE
  • [Back cover]

    Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (168 KB)  
    Freely Available from IEEE

Aims & Scope

The IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) is published monthly. Its editorial board strives to present most important research results in areas within TPAMI's scope.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David A. Forsyth
University of Illinois