Notification:
We are currently experiencing intermittent issues impacting performance. We apologize for the inconvenience.
By Topic

Pattern Analysis and Machine Intelligence, IEEE Transactions on

Issue 5 • Date May 2014

Filter Results

Displaying Results 1 - 20 of 20
  • Table of Contents

    Publication Year: 2014 , Page(s): C1
    Save to Project icon | Request Permissions | PDF file iconPDF (356 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Pattern Analysis and Machine Intelligence Editorial Board

    Publication Year: 2014 , Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (314 KB)  
    Freely Available from IEEE
  • Automatic Upright Adjustment of Photographs With Robust Camera Calibration

    Publication Year: 2014 , Page(s): 833 - 844
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5131 KB) |  | HTML iconHTML  

    Man-made structures often appear to be distorted in photos captured by casual photographers, as the scene layout often conflicts with how it is expected by human perception. In this paper, we propose an automatic approach for straightening up slanted man-made structures in an input image to improve its perceptual quality. We call this type of correction upright adjustment. We propose a set of criteria for upright adjustment based on human perception studies, and develop an optimization framework which yields an optimal homography for adjustment. We also develop a new optimization-based camera calibration method that performs favorably to previous methods and allows the proposed system to work reliably for a wide range of images. The effectiveness of our system is demonstrated by both quantitative comparisons and qualitative user study. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Domain Anomaly Detection in Machine Perception: A System Architecture and Taxonomy

    Publication Year: 2014 , Page(s): 845 - 859
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1252 KB) |  | HTML iconHTML  

    We address the problem of anomaly detection in machine perception. The concept of domain anomaly is introduced as distinct from the conventional notion of anomaly used in the literature. We propose a unified framework for anomaly detection which exposes the multifaceted nature of anomalies and suggest effective mechanisms for identifying and distinguishing each facet as instruments for domain anomaly detection. The framework draws on the Bayesian probabilistic reasoning apparatus which clearly defines concepts such as outlier, noise, distribution drift, novelty detection (object, object primitive), rare events, and unexpected events. Based on these concepts we provide a taxonomy of domain anomaly events. One of the mechanisms helping to pinpoint the nature of anomaly is based on detecting incongruence between contextual and noncontextual sensor(y) data interpretation. The proposed methodology has wide applicability. It underpins in a unified way the anomaly detection applications found in the literature. To illustrate some of its distinguishing features, in here the domain anomaly detection methodology is applied to the problem of anomaly detection for a video annotation system. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exemplar-Based Color Constancy and Multiple Illumination

    Publication Year: 2014 , Page(s): 860 - 873
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1749 KB) |  | HTML iconHTML  

    Exemplar-based learning or, equally, nearest neighbor methods have recently gained interest from researchers in a variety of computer science domains because of the prevalence of large amounts of accessible data and storage capacity. In computer vision, these types of technique have been successful in several problems such as scene recognition, shape matching, image parsing, character recognition, and object detection. Applying the concept of exemplar-based learning to the problem of color constancy seems odd at first glance since, in the first place, similar nearest neighbor images are not usually affected by precisely similar illuminants and, in the second place, gathering a dataset consisting of all possible real-world images, including indoor and outdoor scenes and for all possible illuminant colors and intensities, is indeed impossible. In this paper, we instead focus on surfaces in the image and address the color constancy problem by unsupervised learning of an appropriate model for each training surface in training images. We find nearest neighbor models for each surface in a test image and estimate its illumination based on comparing the statistics of pixels belonging to nearest neighbor surfaces and the target surface. The final illumination estimation results from combining these estimated illuminants over surfaces to generate a unique estimate. We show that it performs very well, for standard datasets, compared to current color constancy algorithms, including when learning based on one image dataset is applied to tests from a different dataset. The proposed method has the advantage of overcoming multi-illuminant situations, which is not possible for most current methods since they assume the color of the illuminant is constant all over the image. We show a technique to overcome the multiple illuminant situation using the proposed method and test our technique on images with two distinct sources of illumination using a multiple-illuminant color constancy - ataset. The concept proposed here is a completely new approach to the color constancy problem and provides a simple learning-based framework. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • From Bits to Images: Inversion of Local Binary Descriptors

    Publication Year: 2014 , Page(s): 874 - 887
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2080 KB) |  | HTML iconHTML  

    Local Binary Descriptors are becoming more and more popular for image matching tasks, especially when going mobile. While they are extensively studied in this context, their ability to carry enough information in order to infer the original image is seldom addressed. In this work, we leverage an inverse problem approach to show that it is possible to directly reconstruct the image content from Local Binary Descriptors. This process relies on very broad assumptions besides the knowledge of the pattern of the descriptor at hand. This generalizes previous results that required either a prior learning database or non-binarized features. Furthermore, our reconstruction scheme reveals differences in the way different Local Binary Descriptors capture and encode image information. Hence, the potential applications of our work are multiple, ranging from privacy issues caused by eavesdropping image keypoints streamed by mobile devices to the design of better descriptors through the visualization and the analysis of their geometric content. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Gaussian Process-Mixture Conditional Heteroscedasticity

    Publication Year: 2014 , Page(s): 888 - 900
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1302 KB) |  | HTML iconHTML  

    Generalized autoregressive conditional heteroscedasticity (GARCH) models have long been considered as one of the most successful families of approaches for volatility modeling in financial return series. In this paper, we propose an alternative approach based on methodologies widely used in the field of statistical machine learning. Specifically, we propose a novel nonparametric Bayesian mixture of Gaussian process regression models, each component of which models the noise variance process that contaminates the observed data as a separate latent Gaussian process driven by the observed data. This way, we essentially obtain a Gaussian process-mixture conditional heteroscedasticity (GPMCH) model for volatility modeling in financial return series. We impose a nonparametric prior with power-law nature over the distribution of the model mixture components, namely the Pitman-Yor process prior, to allow for better capturing modeled data distributions with heavy tails and skewness. Finally, we provide a copula-based approach for obtaining a predictive posterior for the covariances over the asset returns modeled by means of a postulated GPMCH model. We evaluate the efficacy of our approach in a number of benchmark scenarios, and compare its performance to state-of-the-art methodologies. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Geodesic Mapping for Dynamic Surface Alignment

    Publication Year: 2014 , Page(s): 901 - 913
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2040 KB) |  | HTML iconHTML  

    This paper presents a novel approach that achieves dynamic surface alignment by geodesing mapping. The surfaces are 3D manifold meshes representing non-rigid objects in motion (e.g., humans) which can be obtained by multiview stereo reconstruction. The proposed framework consists of a geodesic mapping (i.e., geodesic diffeomorphism) between surfaces which carry a distance function (namely the global geodesic distance), and a geodesic-based coordinate system (namely the global geodesic coordinates) defined similarly to generalized barycentric coordinates. The coordinates are used to recursively choose correspondence points in non-ambiguous regions using a coarse-to-fine strategy to reliably locate all surface points and define a discrete mapping. Complete point-to-point surface alignment with smooth mapping is then derived by optimizing a piecewise objective function within a probabilistic framework. The proposed technique only relies on surface intrinsic geometrical properties, and does not require prior knowledge on surface appearance (e.g., color or texture), shape (e.g., topology) or parameterization (e.g., mesh connectivity or complexity). The method can be used for numerous applications, such as visual information (e.g., texture) transfer between surface models representing different objects, dense motion flow estimation of 3D dynamic surfaces, wide-timeframe matching, etc. Experiments show compelling results on challenging publicly available real-world datasets. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Learning Actionlet Ensemble for 3D Human Action Recognition

    Publication Year: 2014 , Page(s): 914 - 927
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2125 KB) |  | HTML iconHTML  

    Human action recognition is an important yet challenging task. Human actions usually involve human-object interactions, highly articulated motions, high intra-class variations, and complicated temporal structures. The recently developed commodity depth sensors open up new possibilities of dealing with this problem by providing 3D depth data of the scene. This information not only facilitates a rather powerful human motion capturing technique, but also makes it possible to efficiently model human-object interactions and intra-class variations. In this paper, we propose to characterize the human actions with a novel actionlet ensemble model, which represents the interaction of a subset of human joints. The proposed model is robust to noise, invariant to translational and temporal misalignment, and capable of characterizing both the human motion and the human-object interactions. We evaluate the proposed approach on three challenging action recognition datasets captured by Kinect devices, a multiview action recognition dataset captured with Kinect device, and a dataset captured by a motion capture system. The experimental evaluations show that the proposed approach achieves superior performance to the state-of-the-art algorithms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Learning Categories From Few Examples With Multi Model Knowledge Transfer

    Publication Year: 2014 , Page(s): 928 - 941
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1392 KB) |  | HTML iconHTML  

    Learning a visual object category from few samples is a compelling and challenging problem. In several real-world applications collecting many annotated data is costly and not always possible. However, a small training set does not allow to cover the high intraclass variability typical of visual objects. In this condition, machine learning methods provide very few guarantees. This paper presents a discriminative model adaptation algorithm able to proficiently learn a target object with few examples by relying on other previously learned source categories. The proposed method autonomously chooses from where and how much to transfer information by solving a convex optimization problem which ensures to have the minimal leave-one-out error on the available training set. We analyze several properties of the described approach and perform an extensive experimental comparison with other existing transfer solutions, consistently showing the value of our algorithm. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Learning Nonlinear Functions Using Regularized Greedy Forest

    Publication Year: 2014 , Page(s): 942 - 954
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1454 KB) |  | HTML iconHTML  

    We consider the problem of learning a forest of nonlinear decision rules with general loss functions. The standard methods employ boosted decision trees such as Adaboost for exponential loss and Friedman's gradient boosting for general loss. In contrast to these traditional boosting algorithms that treat a tree learner as a black box, the method we propose directly learns decision forests via fully-corrective regularized greedy search using the underlying forest structure. Our method achieves higher accuracy and smaller models than gradient boosting on many of the datasets we have tested on. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Localized Dictionaries Based Orientation Field Estimation for Latent Fingerprints

    Publication Year: 2014 , Page(s): 955 - 969
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5726 KB) |  | HTML iconHTML  

    Dictionary based orientation field estimation approach has shown promising performance for latent fingerprints. In this paper, we seek to exploit stronger prior knowledge of fingerprints in order to further improve the performance. Realizing that ridge orientations at different locations of fingerprints have different characteristics, we propose a localized dictionaries-based orientation field estimation algorithm, in which noisy orientation patch at a location output by a local estimation approach is replaced by real orientation patch in the local dictionary at the same location. The precondition of applying localized dictionaries is that the pose of the latent fingerprint needs to be estimated. We propose a Hough transform-based fingerprint pose estimation algorithm, in which the predictions about fingerprint pose made by all orientation patches in the latent fingerprint are accumulated. Experimental results on challenging latent fingerprint datasets show the proposed method outperforms previous ones markedly. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Robust Text Detection in Natural Scene Images

    Publication Year: 2014 , Page(s): 970 - 983
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1964 KB) |  | HTML iconHTML  

    Text detection in natural scene images is an important prerequisite for many content-based image analysis tasks. In this paper, we propose an accurate and robust method for detecting texts in natural scene images. A fast and effective pruning algorithm is designed to extract Maximally Stable Extremal Regions (MSERs) as character candidates using the strategy of minimizing regularized variations. Character candidates are grouped into text candidates by the single-link clustering algorithm, where distance weights and clustering threshold are learned automatically by a novel self-training distance metric learning algorithm. The posterior probabilities of text candidates corresponding to non-text are estimated with a character classifier; text candidates with high non-text probabilities are eliminated and texts are identified with a text classifier. The proposed system is evaluated on the ICDAR 2011 Robust Reading Competition database; the f-measure is over 76%, much better than the state-of-the-art performance of 71%. Experiments on multilingual, street view, multi-orientation and even born-digital databases also demonstrate the effectiveness of the proposed method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Support Vector Machine Classifier With Pinball Loss

    Publication Year: 2014 , Page(s): 984 - 997
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2029 KB) |  | HTML iconHTML  

    Traditionally, the hinge loss is used to construct support vector machine (SVM) classifiers. The hinge loss is related to the shortest distance between sets and the corresponding classifier is hence sensitive to noise and unstable for re-sampling. In contrast, the pinball loss is related to the quantile distance and the result is less sensitive. The pinball loss has been deeply studied and widely applied in regression but it has not been used for classification. In this paper, we propose a SVM classifier with the pinball loss, called pin-SVM, and investigate its properties, including noise insensitivity, robustness, and misclassification error. Besides, insensitive zone is applied to the pin-SVM for a sparse model. Compared to the SVM with the hinge loss, the proposed pin-SVM has the same computational complexity and enjoys noise insensitivity and re-sampling stability. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 2D Affine and Projective Shape Analysis

    Publication Year: 2014 , Page(s): 998 - 1011
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2038 KB) |  | HTML iconHTML  

    Current techniques for shape analysis tend to seek invariance to similarity transformations (rotation, translation, and scale), but certain imaging situations require invariance to larger groups, such as affine or projective groups. Here we present a general Riemannian framework for shape analysis of planar objects where metrics and related quantities are invariant to affine and projective groups. Highlighting two possibilities for representing object boundaries-ordered points (or landmarks) and parameterized curves-we study different combinations of these representations (points and curves) and transformations (affine and projective). Specifically, we provide solutions to three out of four situations and develop algorithms for computing geodesics and intrinsic sample statistics, leading up to Gaussian-type statistical models, and classifying test shapes using such models learned from training data. In the case of parameterized curves, we also achieve the desired goal of invariance to re-parameterizations. The geodesics are constructed by particularizing the path-straightening algorithm to geometries of current manifolds and are used, in turn, to compute shape statistics and Gaussian-type shape models. We demonstrate these ideas using a number of examples from shape and activity recognition. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 3D Traffic Scene Understanding From Movable Platforms

    Publication Year: 2014 , Page(s): 1012 - 1025
    Cited by:  Papers (2)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1821 KB) |  | HTML iconHTML  

    In this paper, we present a novel probabilistic generative model for multi-object traffic scene understanding from movable platforms which reasons jointly about the 3D scene layout as well as the location and orientation of objects in the scene. In particular, the scene topology, geometry, and traffic activities are inferred from short video sequences. Inspired by the impressive driving capabilities of humans, our model does not rely on GPS, lidar, or map knowledge. Instead, it takes advantage of a diverse set of visual cues in the form of vehicle tracklets, vanishing points, semantic scene labels, scene flow, and occupancy grids. For each of these cues, we propose likelihood functions that are integrated into a probabilistic generative model. We learn all model parameters from training data using contrastive divergence. Experiments conducted on videos of 113 representative intersections show that our approach successfully infers the correct layout in a variety of very challenging scenarios. To evaluate the importance of each feature cue, experiments using different feature combinations are conducted. Furthermore, we show how by employing context derived from the proposed method we are able to improve over the state-of-the-art in terms of object detection and object orientation estimation in challenging and cluttered urban environments. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hardware-Efficient Bilateral Filtering for Stereo Matching

    Publication Year: 2014 , Page(s): 1026 - 1032
    Cited by:  Papers (1)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (954 KB) |  | HTML iconHTML  

    This paper presents a new bilateral filtering method specially designed for practical stereo vision systems. Parallel algorithms are preferred in these systems due to the real-time performance requirement. Edge-preserving filters like the bilateral filter have been demonstrated to be very effective for high-quality local stereo matching. A hardware-efficient bilateral filter is thus proposed in this paper. When moved to an NVIDIA GeForce GTX 580 GPU, it can process a one megapixel color image at around 417 frames per second. This filter can be directly used for cost aggregation required in any local stereo matching algorithm. Quantitative evaluation shows that it outperforms all the other local stereo methods both in terms of accuracy and speed on Middlebury benchmark. It ranks 12th out of over 120 methods on Middlebury data sets, and the average runtime (including the matching cost computation, occlusion handling, and post processing) is only 15 milliseconds (67 frames per second). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Local Pyramidal Descriptors for Image Recognition

    Publication Year: 2014 , Page(s): 1033 - 1040
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1195 KB) |  | HTML iconHTML  

    In this paper, we present a novel method to improve the flexibility of descriptor matching for image recognition by using local multiresolution pyramids in feature space. We propose that image patches be represented at multiple levels of descriptor detail and that these levels be defined in terms of local spatial pooling resolution. Preserving multiple levels of detail in local descriptors is a way of hedging one's bets on which levels will most relevant for matching during learning and recognition. We introduce the Pyramid SIFT (P-SIFT) descriptor and show that its use in four state-of-the-art image recognition pipelines improves accuracy and yields state-of-the-art results. Our technique is applicable independently of spatial pyramid matching and we show that spatial pyramids can be combined with local pyramids to obtain further improvement. We achieve state-of-the-art results on Caltech-101 (80.1%) and Caltech-256 (52.6%) when compared to other approaches based on SIFT features over intensity images. Our technique is efficient and is extremely easy to integrate into image recognition pipelines. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors

    Publication Year: 2014 , Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (314 KB)  
    Freely Available from IEEE
  • IEEE Computer Society

    Publication Year: 2014 , Page(s): C4
    Save to Project icon | Request Permissions | PDF file iconPDF (351 KB)  
    Freely Available from IEEE

Aims & Scope

The IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) is published monthly. Its editorial board strives to present most important research results in areas within TPAMI's scope.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David A. Forsyth
University of Illinois