By Topic

Pattern Analysis and Machine Intelligence, IEEE Transactions on

Issue 5 • Date May 2013

Filter Results

Displaying Results 1 - 23 of 23
  • [Table of contents]

    Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (206 KB)  
    Freely Available from IEEE
  • Cover2

    Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (200 KB)  
    Freely Available from IEEE
  • A Convex Formulation for Learning a Shared Predictive Structure from Multiple Tasks

    Page(s): 1025 - 1038
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1381 KB)  

    In this paper, we consider the problem of learning from multiple related tasks for improved generalization performance by extracting their shared structures. The alternating structure optimization (ASO) algorithm, which couples all tasks using a shared feature representation, has been successfully applied in various multitask learning problems. However, ASO is nonconvex and the alternating algorithm only finds a local solution. We first present an improved ASO formulation (iASO) for multitask learning based on a new regularizer. We then convert iASO, a nonconvex formulation, into a relaxed convex one (rASO). Interestingly, our theoretical analysis reveals that rASO finds a globally optimal solution to its nonconvex counterpart iASO under certain conditions. rASO can be equivalently reformulated as a semidefinite program (SDP), which is, however, not scalable to large datasets. We propose to employ the block coordinate descent (BCD) method and the accelerated projected gradient (APG) algorithm separately to find the globally optimal solution to rASO; we also develop efficient algorithms for solving the key subproblems involved in BCD and APG. The experiments on the Yahoo webpages datasets and the Drosophila gene expression pattern images datasets demonstrate the effectiveness and efficiency of the proposed algorithms and confirm our theoretical analysis. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Algorithms for 3D Shape Scanning with a Depth Camera

    Page(s): 1039 - 1050
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (6121 KB) |  | HTML iconHTML  

    We describe a method for 3D object scanning by aligning depth scans that were taken from around an object with a Time-of-Flight (ToF) camera. These ToF cameras can measure depth scans at video rate. Due to comparably simple technology, they bear potential for economical production in big volumes. Our easy-to-use, cost-effective scanning solution, which is based on such a sensor, could make 3D scanning technology more accessible to everyday users. The algorithmic challenge we face is that the sensor's level of random noise is substantial and there is a nontrivial systematic bias. In this paper, we show the surprising result that 3D scans of reasonable quality can also be obtained with a sensor of such low data quality. Established filtering and scan alignment techniques from the literature fail to achieve this goal. In contrast, our algorithm is based on a new combination of a 3D superresolution method with a probabilistic scan alignment approach that explicitly takes into account the sensor's noise characteristics. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Incremental DPMM-Based Method for Trajectory Clustering, Modeling, and Retrieval

    Page(s): 1051 - 1065
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1672 KB)  

    Trajectory analysis is the basis for many applications, such as indexing of motion events in videos, activity recognition, and surveillance. In this paper, the Dirichlet process mixture model (DPMM) is applied to trajectory clustering, modeling, and retrieval. We propose an incremental version of a DPMM-based clustering algorithm and apply it to cluster trajectories. An appropriate number of trajectory clusters is determined automatically. When trajectories belonging to new clusters arrive, the new clusters can be identified online and added to the model without any retraining using the previous data. A time-sensitive Dirichlet process mixture model (tDPMM) is applied to each trajectory cluster for learning the trajectory pattern which represents the time-series characteristics of the trajectories in the cluster. Then, a parameterized index is constructed for each cluster. A novel likelihood estimation algorithm for the tDPMM is proposed, and a trajectory-based video retrieval model is developed. The tDPMM-based probabilistic matching method and the DPMM-based model growing method are combined to make the retrieval model scalable and adaptable. Experimental comparisons with state-of-the-art algorithms demonstrate the effectiveness of our algorithm. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hough Forest Random Field for Object Recognition and Segmentation

    Page(s): 1066 - 1079
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2539 KB) |  | HTML iconHTML  

    This paper presents a new computational framework for detecting and segmenting object occurrences in images. We combine Hough forest (HF) and conditional random field (CRF) into HFRF to assign labels of object classes to image regions. HF captures intrinsic and contextual properties of objects. CRF then fuses the labeling hypotheses generated by HF for identifying every object occurrence. Interaction between HF and CRF happens in HFRF inference, which uses the Metropolis-Hastings algorithm. The Metropolis-Hastings reversible jumps depend on two ratios of proposal and posterior distributions. Instead of estimating four distributions, we directly compute the two ratios using HF. In leaf nodes, HF records class histograms of training examples and information about their configurations. This evidence is used in inference for nonparametric estimation of the two distribution ratios. Our empirical evaluation on benchmark datasets demonstrates higher average precision rates of object detection, smaller object segmentation error, and faster convergence rates of our inference, relative to the state of the art. The paper also presents theoretical error bounds of HF and HFRF applied to a two-class object detection and segmentation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Inverse Rendering of Faces with a 3D Morphable Model

    Page(s): 1080 - 1093
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1885 KB)  

    In this paper, we present a complete framework to inverse render faces with a 3D Morphable Model (3DMM). By decomposing the image formation process into geometric and photometric parts, we are able to state the problem as a multilinear system which can be solved accurately and efficiently. As we treat each contribution as independent, the objective function is convex in the parameters and a global solution is guaranteed. We start by recovering 3D shape using a novel algorithm which incorporates generalization error of the model obtained from empirical measurements. We then describe two methods to recover facial texture, diffuse lighting, specular reflectance, and camera properties from a single image. The methods make increasingly weak assumptions and can be solved in a linear fashion. We evaluate our findings on a publicly available database, where we are able to outperform an existing state-of-the-art algorithm. We demonstrate the usability of the recovered parameters in a recognition experiment conducted on the CMU-PIE database. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Joint Depth Map and Color Consistency Estimation for Stereo Images with Different Illuminations and Cameras

    Page(s): 1094 - 1106
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (6771 KB) |  | HTML iconHTML  

    In this paper, we propose a method that infers both accurate depth maps and color-consistent stereo images for radiometrically varying stereo images. In general, stereo matching and performing color consistency between stereo images are a chicken-and-egg problem since it is not a trivial task to simultaneously achieve both goals. Hence, we have developed an iterative framework in which these two processes can boost each other. First, we transform the input color images to log-chromaticity color space, from which a linear relationship can be established during constructing a joint pdf of transformed left and right color images. From this joint pdf, we can estimate a linear function that relates the corresponding pixels in stereo images. Based on this linear property, we present a new stereo matching cost by combining Mutual Information (MI), SIFT descriptor, and segment-based plane-fitting to robustly find correspondence for stereo image pairs which undergo radiometric variations. Meanwhile, we devise a Stereo Color Histogram Equalization (SCHE) method to produce color-consistent stereo image pairs, which conversely boost the disparity map estimation. Experimental results show that our method produces both accurate depth maps and color-consistent stereo images, even for stereo images with severe radiometric differences. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Learning a Confidence Measure for Optical Flow

    Page(s): 1107 - 1120
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2787 KB)  

    We present a supervised learning-based method to estimate a per-pixel confidence for optical flow vectors. Regions of low texture and pixels close to occlusion boundaries are known to be difficult for optical flow algorithms. Using a spatiotemporal feature vector, we estimate if a flow algorithm is likely to fail in a given region. Our method is not restricted to any specific class of flow algorithm and does not make any scene specific assumptions. By automatically learning this confidence, we can combine the output of several computed flow fields from different algorithms to select the best performing algorithm per pixel. Our optical flow confidence measure allows one to achieve better overall results by discarding the most troublesome pixels. We illustrate the effectiveness of our method on four different optical flow algorithms over a variety of real and synthetic sequences. For algorithm selection, we achieve the top overall results on a large test set, and at times even surpass the results of the best algorithm among the candidates. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Learning Topic Models by Belief Propagation

    Page(s): 1121 - 1134
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1917 KB) |  | HTML iconHTML  

    Latent Dirichlet allocation (LDA) is an important hierarchical Bayesian model for probabilistic topic modeling, which attracts worldwide interest and touches on many important applications in text mining, computer vision and computational biology. This paper represents the collapsed LDA as a factor graph, which enables the classic loopy belief propagation (BP) algorithm for approximate inference and parameter estimation. Although two commonly used approximate inference methods, such as variational Bayes (VB) and collapsed Gibbs sampling (GS), have gained great success in learning LDA, the proposed BP is competitive in both speed and accuracy, as validated by encouraging experimental results on four large-scale document datasets. Furthermore, the BP algorithm has the potential to become a generic scheme for learning variants of LDA-based topic models in the collapsed space. To this end, we show how to learn two typical variants of LDA-based topic models, such as author-topic models (ATM) and relational topic models (RTM), using BP based on the factor graph representations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Linear Dependency Modeling for Classifier Fusion and Feature Combination

    Page(s): 1135 - 1148
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1697 KB)  

    This paper addresses the independent assumption issue in fusion process. In the last decade, dependency modeling techniques were developed under a specific distribution of classifiers or by estimating the joint distribution of the posteriors. This paper proposes a new framework to model the dependency between features without any assumption on feature/classifier distribution, and overcomes the difficulty in estimating the high-dimensional joint density. In this paper, we prove that feature dependency can be modeled by a linear combination of the posterior probabilities under some mild assumptions. Based on the linear combination property, two methods, namely, Linear Classifier Dependency Modeling (LCDM) and Linear Feature Dependency Modeling (LFDM), are derived and developed for dependency modeling in classifier level and feature level, respectively. The optimal models for LCDM and LFDM are learned by maximizing the margin between the genuine and imposter posterior probabilities. Both synthetic data and real datasets are used for experiments. Experimental results show that LCDM and LFDM with dependency modeling outperform existing classifier level and feature level combination methods under nonnormal distributions and on four real databases, respectively. Comparing the classifier level and feature level fusion methods, LFDM gives the best performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Local Evidence Aggregation for Regression-Based Facial Point Detection

    Page(s): 1149 - 1163
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2328 KB)  

    We propose a new algorithm to detect facial points in frontal and near-frontal face images. It combines a regression-based approach with a probabilistic graphical model-based face shape model that restricts the search to anthropomorphically consistent regions. While most regression-based approaches perform a sequential approximation of the target location, our algorithm detects the target location by aggregating the estimates obtained from stochastically selected local appearance information into a single robust prediction. The underlying assumption is that by aggregating the different estimates, their errors will cancel out as long as the regressor inputs are uncorrelated. Once this new perspective is adopted, the problem is reformulated as how to optimally select the test locations over which the regressors are evaluated. We propose to extend the regression-based model to provide a quality measure of each prediction, and use the shape model to restrict and correct the sampling region. Our approach combines the low computational cost typical of regression-based approaches with the robustness of exhaustive-search approaches. The proposed algorithm was tested on over 7,500 images from five databases. Results showed significant improvement over the current state of the art. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multiscale Local Phase Quantization for Robust Component-Based Face Recognition Using Kernel Fusion of Multiple Descriptors

    Page(s): 1164 - 1177
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2289 KB) |  | HTML iconHTML  

    Face recognition subject to uncontrolled illumination and blur is challenging. Interestingly, image degradation caused by blurring, often present in real-world imagery, has mostly been overlooked by the face recognition community. Such degradation corrupts face information and affects image alignment, which together negatively impact recognition accuracy. We propose a number of countermeasures designed to achieve system robustness to blurring. First, we propose a novel blur-robust face image descriptor based on Local Phase Quantization (LPQ) and extend it to a multiscale framework (MLPQ) to increase its effectiveness. To maximize the insensitivity to misalignment, the MLPQ descriptor is computed regionally by adopting a component-based framework. Second, the regional features are combined using kernel fusion. Third, the proposed MLPQ representation is combined with the Multiscale Local Binary Pattern (MLBP) descriptor using kernel fusion to increase insensitivity to illumination. Kernel Discriminant Analysis (KDA) of the combined features extracts discriminative information for face recognition. Last, two geometric normalizations are used to generate and combine multiple scores from different face image scales to further enhance the accuracy. The proposed approach has been comprehensively evaluated using the combined Yale and Extended Yale database B (degraded by artificially induced linear motion blur) as well as the FERET, FRGC 2.0, and LFW databases. The combined system is comparable to state-of-the-art approaches using similar system configurations. The reported work provides a new insight into the merits of various face representation and fusion methods, as well as their role in dealing with variable lighting and blur degradation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Online Feature Selection with Streaming Features

    Page(s): 1178 - 1192
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2508 KB) |  | HTML iconHTML  

    We propose a new online feature selection framework for applications with streaming features where the knowledge of the full feature space is unknown in advance. We define streaming features as features that flow in one by one over time whereas the number of training examples remains fixed. This is in contrast with traditional online learning methods that only deal with sequentially added observations, with little attention being paid to streaming features. The critical challenges for Online Streaming Feature Selection (OSFS) include 1) the continuous growth of feature volumes over time, 2) a large feature space, possibly of unknown or infinite size, and 3) the unavailability of the entire feature set before learning starts. In the paper, we present a novel Online Streaming Feature Selection method to select strongly relevant and nonredundant features on the fly. An efficient Fast-OSFS algorithm is proposed to improve feature selection performance. The proposed algorithms are evaluated extensively on high-dimensional datasets and also with a real-world case study on impact crater detection. Experimental results demonstrate that the algorithms achieve better compactness and higher prediction accuracy than existing streaming feature selection algorithms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Partial Face Recognition: Alignment-Free Approach

    Page(s): 1193 - 1205
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1647 KB)  

    Numerous methods have been developed for holistic face recognition with impressive performance. However, few studies have tackled how to recognize an arbitrary patch of a face image. Partial faces frequently appear in unconstrained scenarios, with images captured by surveillance cameras or handheld devices (e.g., mobile phones) in particular. In this paper, we propose a general partial face recognition approach that does not require face alignment by eye coordinates or any other fiducial points. We develop an alignment-free face representation method based on Multi-Keypoint Descriptors (MKD), where the descriptor size of a face is determined by the actual content of the image. In this way, any probe face image, holistic or partial, can be sparsely represented by a large dictionary of gallery descriptors. A new keypoint descriptor called Gabor Ternary Pattern (GTP) is also developed for robust and discriminative face recognition. Experimental results are reported on four public domain face databases (FRGCv2.0, AR, LFW, and PubFig) under both the open-set identification and verification scenarios. Comparisons with two leading commercial face recognition SDKs (PittPatt and FaceVACS) and two baseline algorithms (PCA+LDA and LBP) show that the proposed method, overall, is superior in recognizing both holistic and partial faces without requiring alignment. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Self-Calibration of Catadioptric Camera with Two Planar Mirrors from Silhouettes

    Page(s): 1206 - 1220
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1801 KB) |  | HTML iconHTML  

    If an object is interreflected between two planar mirrors, we may take an image containing both the object and its multiple reflections, i.e., simultaneously imaging multiple views of an object by a single pinhole camera. This paper emphasizes the problem of recovering both the intrinsic and extrinsic parameters of the camera using multiple silhouettes from one single image. View pairs among views in a single image can be divided into two kinds by the relationship between the two views in the pair: reflected by some mirror (real or virtual) and in a circular motion. Epipoles in the first kind of pairs can be easily determined from intersections of common tangent lines of silhouettes. Based on the projective properties of these epipoles, efficient methods are proposed to recover both the imaged circular points and the included angle between two mirrors. Epipoles in the second kind of pairs can be recovered simultaneously with the projection of intersection line between two mirrors by solving a simple 1D optimization problem using the consistency constraint of epipolar tangent lines. Fundamental matrices among views in a single image are all recovered. Using the estimated intrinsic and extrinsic parameters of the camera, a euclidean reconstruction can be obtained. Experiments validate the proposed approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Simultaneous Registration of Multiple Images: Similarity Metrics and Efficient Optimization

    Page(s): 1221 - 1233
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (712 KB)  

    We address the alignment of a group of images with simultaneous registration. Therefore, we provide further insights into a recently introduced framework for multivariate similarity measures, referred to as accumulated pair-wise estimates (APE), and derive efficient optimization methods for it. More specifically, we show a strict mathematical deduction of APE from a maximum-likelihood framework and establish a connection to the congealing framework. This is only possible after an extension of the congealing framework with neighborhood information. Moreover, we address the increased computational complexity of simultaneous registration by deriving efficient gradient-based optimization strategies for APE: Gauss-Newton and the efficient second-order minimization (ESM). We present next to SSD the usage of intrinsically nonsquared similarity measures in this least squares optimization framework. The fundamental assumption of ESM, the approximation of the perfectly aligned moving image through the fixed image, limits its application to monomodal registration. We therefore incorporate recently proposed structural representations of images which allow us to perform multimodal registration with ESM. Finally, we evaluate the performance of the optimization strategies with respect to the similarity measures, leading to very good results for ESM. The extension to multimodal registration is in this context very interesting because it offers further possibilities for evaluations, due to publicly available datasets with ground-truth alignment. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Spatially Varying Color Distributions for Interactive Multilabel Segmentation

    Page(s): 1234 - 1247
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (10215 KB) |  | HTML iconHTML  

    We propose a method for interactive multilabel segmentation which explicitly takes into account the spatial variation of color distributions. To this end, we estimate a joint distribution over color and spatial location using a generalized Parzen density estimator applied to each user scribble. In this way, we obtain a likelihood for observing certain color values at a spatial coordinate. This likelihood is then incorporated in a Bayesian MAP estimation approach to multiregion segmentation which in turn is optimized using recently developed convex relaxation techniques. These guarantee global optimality for the two-region case (foreground/background) and solutions of bounded optimality for the multiregion case. We show results on the GrabCut benchmark, the recently published Graz benchmark, and on the Berkeley segmentation database which exceed previous approaches such as GrabCut [32], the Random Walker [15], Santner's approach [35], TV-Seg [39], and interactive graph cuts [4] in accuracy. Our results demonstrate that taking into account the spatial variation of color models leads to drastic improvements for interactive image segmentation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Tracking People's Hands and Feet Using Mixed Network AND/OR Search

    Page(s): 1248 - 1262
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (14806 KB)  

    We describe a framework that leverages mixed probabilistic and deterministic networks and their AND/OR search space to efficiently find and track the hands and feet of multiple interacting humans in 2D from a single camera view. Our framework detects and tracks multiple people's heads, hands, and feet through partial or full occlusion; requires few constraints (does not require multiple views, high image resolution, knowledge of performed activities, or large training sets); and makes use of constraints and AND/OR Branch-and-Bound with lazy evaluation and carefully computed bounds to efficiently solve the complex network that results from the consideration of interperson occlusion. Our main contributions are: 1) a multiperson part-based formulation that emphasizes extremities and allows for the globally optimal solution to be obtained in each frame, and 2) an efficient and exact optimization scheme that relies on AND/OR Branch-and-Bound, lazy factor evaluation, and factor cost sensitive bound computation. We demonstrate our approach on three datasets: the public single person HumanEva dataset, outdoor sequences where multiple people interact in a group meeting scenario, and outdoor one-on-one basketball videos. The first dataset demonstrates that our framework achieves state-of-the-art performance in the single person setting, while the last two demonstrate robustness in the presence of partial and full occlusion and fast nontrivial motion. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Unified Detection and Tracking of Instruments during Retinal Microsurgery

    Page(s): 1263 - 1273
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1027 KB)  

    Methods for tracking an object have generally fallen into two groups: tracking by detection and tracking through local optimization. The advantage of detection-based tracking is its ability to deal with target appearance and disappearance, but it does not naturally take advantage of target motion continuity during detection. The advantage of local optimization is efficiency and accuracy, but it requires additional algorithms to initialize tracking when the target is lost. To bridge these two approaches, we propose a framework for unified detection and tracking as a time-series Bayesian estimation problem. The basis of our approach is to treat both detection and tracking as a sequential entropy minimization problem, where the goal is to determine the parameters describing a target in each frame. To do this we integrate the Active Testing (AT) paradigm with Bayesian filtering, and this results in a framework capable of both detecting and tracking robustly in situations where the target object enters and leaves the field of view regularly. We demonstrate our approach on a retinal tool tracking problem and show through extensive experiments that our method provides an efficient and robust tracking solution. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Schroedinger Eigenmaps for the Analysis of Biomedical Data

    Page(s): 1274 - 1280
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (820 KB) |  | HTML iconHTML  

    We introduce Schroedinger Eigenmaps (SE), a new semi-supervised manifold learning and recovery technique. This method is based on an implementation of graph Schroedinger operators with appropriately constructed barrier potentials as carriers of labeled information. We use our approach for the analysis of standard biomedical datasets and new multispectral retinal images. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • [Back inside cover]

    Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (200 KB)  
    Freely Available from IEEE
  • [Back cover]

    Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (206 KB)  
    Freely Available from IEEE

Aims & Scope

The IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) is published monthly. Its editorial board strives to present most important research results in areas within TPAMI's scope.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David A. Forsyth
University of Illinois