By Topic

Pattern Analysis and Machine Intelligence, IEEE Transactions on

Issue 2 • Date Feb. 2008

Filter Results

Displaying Results 1 - 21 of 21
  • [Front cover]

    Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (136 KB)  
    Freely Available from IEEE
  • [Inside front cover]

    Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (83 KB)  
    Freely Available from IEEE
  • Editorial-State of the Transactions

    Page(s): 193 - 194
    Save to Project icon | Request Permissions | PDF file iconPDF (37 KB)  
    Freely Available from IEEE
  • Introduction of New Editors

    Page(s): 195 - 196
    Save to Project icon | Request Permissions | PDF file iconPDF (121 KB)  
    Freely Available from IEEE
  • A Theory Of Frequency Domain Invariants: Spherical Harmonic Identities for BRDF/Lighting Transfer and Image Consistency

    Page(s): 197 - 213
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2168 KB) |  | HTML iconHTML  

    This paper develops a theory of frequency domain invariants in computer vision. We derive novel identities using spherical harmonics, which are the angular frequency domain analog to common spatial domain invariants such as reflectance ratios. These invariants are derived from the spherical harmonic convolution framework for reflection from a curved surface. Our identities apply in a number of canonical cases, including single and multiple images of objects under the same and different lighting conditions. One important case we consider is two different glossy objects in two different lighting environments. For this case, we derive a novel identity, independent of the specific lighting configurations or BRDFs, that allows us to directly estimate the fourth image if the other three are available. The identity can also be used as an invariant to detect tampering in the images. Although this paper is primarily theoretical, it has the potential to lay the mathematical foundations for two important practical applications. First, we can develop more general algorithms for inverse rendering problems, which can directly relight and change material properties by transferring the BRDF or lighting from another object or illumination. Second, we can check the consistency of an image to detect tampering or image splicing. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Three-View Multibody Structure from Motion

    Page(s): 214 - 227
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2027 KB) |  | HTML iconHTML  

    We propose a geometric approach to 3D motion segmentation from point correspondences in three perspective views. We demonstrate that after applying a polynomial embedding to the point correspondences, they become related by the so-called multibody trilinear constraint and its associated multibody trifocal tensor, which are natural generalizations of the trilinear constraint and the trifocal tensor to multiple motions. We derive a rank constraint on the embedded correspondences from which one can estimate the number of independent motions, as well as linearly solve for the multibody trifocal tensor. We then show how to compute the epipolar lines associated with each image point from the common root of a set of univariate polynomials and the epipoles by solving a pair of plane clustering problems using Generalized Principal Component Analysis (GPCA). The individual trifocal tensors are then obtained from the second-order derivatives of the multibody trilinear constraint. Given epipolar lines and epipoles or trifocal tensors, one can immediately obtain an initial clustering of the correspondences. We use this clustering to initialize an iterative algorithm that alternates between the computation of the trifocal tensors and the segmentation of the correspondences. We test our algorithm on various synthetic and real scenes and compare it with other algebraic and iterative algorithms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Closed-Form Solution to Natural Image Matting

    Page(s): 228 - 242
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4469 KB) |  | HTML iconHTML  

    Interactive digital matting, the process of extracting a foreground object from an image based on limited user input, is an important task in image and video editing. From a computer vision perspective, this task is extremely challenging because it is massively ill-posed - at each pixel we must estimate the foreground and the background colors, as well as the foreground opacity ("alpha matte") from a single color measurement. Current approaches either restrict the estimation to a small part of the image, estimating foreground and background colors based on nearby pixels where they are known, or perform iterative nonlinear estimation by alternating foreground and background color estimation with alpha estimation. In this paper, we present a closed-form solution to natural image matting. We derive a cost function from local smoothness assumptions on foreground and background colors and show that in the resulting expression, it is possible to analytically eliminate the foreground and background colors to obtain a quadratic cost function in alpha. This allows us to find the globally optimal alpha matte by solving a sparse linear system of equations. Furthermore, the closed-form formula allows us to predict the properties of the solution by analyzing the eigenvectors of a sparse matrix, closely related to matrices used in spectral image segmentation algorithms. We show that high-quality mattes for natural images may be obtained from a small amount of user input. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • K-Nearest Neighbor Finding Using MaxNearestDist

    Page(s): 243 - 252
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (988 KB) |  | HTML iconHTML  

    Similarity searching often reduces to finding the k nearest neighbors to a query object. Finding the k nearest neighbors is achieved by applying either a depth-first or a best-first algorithm to the search hierarchy containing the data. These algorithms are generally applicable to any index based on hierarchical clustering. The idea is that the data is partitioned into clusters that are aggregated to form other clusters, with the total aggregation being represented as a tree. These algorithms have traditionally used a lower bound corresponding to the minimum distance at which a nearest neighbor can be found (termed MlNDlST) to prune the search process by avoiding the processing of some of the clusters, as well as individual objects when they can be shown to be farther from the query object q than all of the current k nearest neighbors of q. An alternative pruning technique that uses an upper bound corresponding to the maximum possible distance at which a nearest neighbor is guaranteed to be found (termed MaxNearestDist) is described. The MaxNearestDist upper bound is adapted to enable its use for finding the k nearest neighbors instead of just the nearest neighbor (that is, k = 1) as in its previous uses. Both the depth-first and best-first fc-nearest neighbor algorithms are modified to use MaxNearestDist, which is shown to enhance both algorithms by overcoming their shortcomings. In particular, for the depth-first algorithm, the number of clusters in the search hierarchy that must be examined is not increased thereby potentially lowering its execution time, while for the best-first algorithm, the number of clusters in the search hierarchy that must be retained in the priority queue used to control the ordering of processing of the clusters is also not increased, thereby potentially lowering its storage requirements. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Weighted Pseudometric Discriminatory Power Improvement Using a Bayesian Logistic Regression Model Based on a Variational Method

    Page(s): 253 - 266
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1989 KB) |  | HTML iconHTML  

    In this paper, we investigate the effectiveness of a Bayesian logistic regression model to compute the weights of a pseudometric in order to improve its discriminatory capacity and thereby increase image retrieval accuracy. In the proposed Bayesian model, the prior knowledge of the observations is incorporated and the posterior distribution is approximated by a tractable Gaussian form using variational transformation and Jensen's inequality, which allow a fast and straightforward computation of the weights. The pseudometric makes use of the compressed and quantized versions of wavelet decomposed feature vectors, and in our previous work, the weights were adjusted by the classical logistic regression model. A comparative evaluation of the Bayesian and classical logistic regression models is performed for content-based image retrieval, as well as for other classification tasks, in a decontextualized evaluation framework. In this same framework, we compare the Bayesian logistic regression model to some relevant state-of-the-art classification algorithms. Experimental results show that the Bayesian logistic regression model outperforms these linear classification algorithms and is a significantly better tool than the classical logistic regression model to compute the pseudometric weights and improve retrieval and classification performance. Finally, we perform a comparison with results obtained by other retrieval methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multicamera People Tracking with a Probabilistic Occupancy Map

    Page(s): 267 - 282
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3565 KB) |  | HTML iconHTML  

    Given two to four synchronized video streams taken at eye level and from different angles, we show that we can effectively combine a generative model with dynamic programming to accurately follow up to six individuals across thousands of frames in spite of significant occlusions and lighting changes. In addition, we also derive metrically accurate trajectories for each of them. Our contribution is twofold. First, we demonstrate that our generative model can effectively handle occlusions in each time frame independently, even when the only data available comes from the output of a simple background subtraction algorithm and when the number of individuals is unknown a priori. Second, we show that multiperson tracking can be reliably achieved by processing individual trajectories separately over long sequences, provided that a reasonable heuristic is used to rank these individuals and that we avoid confusing them with one another. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Gaussian Process Dynamical Models for Human Motion

    Page(s): 283 - 298
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1893 KB) |  | HTML iconHTML  

    We introduce Gaussian process dynamical models (GPDMs) for nonlinear time series analysis, with applications to learning models of human pose and motion from high-dimensional motion capture data. A GPDM is a latent variable model. It comprises a low-dimensional latent space with associated dynamics, as well as a map from the latent space to an observation space. We marginalize out the model parameters in closed form by using Gaussian process priors for both the dynamical and the observation mappings. This results in a nonparametric model for dynamical systems that accounts for uncertainty in the model. We demonstrate the approach and compare four learning algorithms on human motion capture data, in which each pose is 50-dimensional. Despite the use of small data sets, the GPDM learns an effective representation of the nonlinear dynamics in these spaces. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic Estimation and Removal of Noise from a Single Image

    Page(s): 299 - 314
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5424 KB) |  | HTML iconHTML  

    Image denoising algorithms often assume an additive white Gaussian noise (AWGN) process that is independent of the actual RGB values. Such approaches cannot effectively remove color noise produced by today's CCD digital camera. In this paper, we propose a unified framework for two tasks: automatic estimation and removal of color noise from a single image using piecewise smooth image models. We introduce the noise level function (NLF), which is a continuous function describing the noise level as a function of image brightness. We then estimate an upper bound of the real NLF by fitting a lower envelope to the standard deviations of per-segment image variances. For denoising, the chrominance of color noise is significantly removed by projecting pixel values onto a line fit to the RGB values in each segment. Then, a Gaussian conditional random field (GCRF) is constructed to obtain the underlying clean image from the noisy input. Extensive experiments are conducted to test the proposed algorithm, which is shown to outperform state-of-the-art denoising algorithms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Plane-Based Optimization for 3D Object Reconstruction from Single Line Drawings

    Page(s): 315 - 327
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1783 KB) |  | HTML iconHTML  

    In previous optimization-based methods of 3D planar-faced object reconstruction from single 2D line drawings, the missing depths of the vertices of a line drawing (and other parameters in some methods) are used as the variables of the objective functions. A 3D object with planar faces is derived by finding values for these variables that minimize the objective functions. These methods work well for simple objects with a small number TV of variables. As N grows, however, it is very difficult for them to find the expected objects. This is because with the nonlinear objective functions in a space of large dimension , the search for optimal solutions can easily get trapped into local minima. In this paper, we use the parameters of the planes that pass through the planar faces of an object as the variables of the objective function. This leads to a set of linear constraints on the planes of the object, resulting in a much lower dimensional null space where optimization is easier to achieve. We prove that the dimension of this null space is exactly equal to the minimum number of vertex depths that define the 3D object. Since a practical line drawing is usually not an exact projection of a 3D object, we expand the null space to a larger space based on the singular value decomposition of the projection matrix of the line drawing. In this space, robust 3D reconstruction can be achieved. Compared with the two most related methods, our method not only can reconstruct more complex 3D objects from 2D line drawings but also is computationally more efficient. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Stereo Processing by Semiglobal Matching and Mutual Information

    Page(s): 328 - 341
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5087 KB) |  | HTML iconHTML  

    This paper describes the semiglobal matching (SGM) stereo method. It uses a pixelwise, mutual information (Ml)-based matching cost for compensating radiometric differences of input images. Pixelwise matching is supported by a smoothness constraint that is usually expressed as a global cost function. SGM performs a fast approximation by pathwise optimizations from all directions. The discussion also addresses occlusion detection, subpixel refinement, and multibaseline matching. Additionally, postprocessing steps for removing outliers, recovering from specific problems of structured environments, and the interpolation of gaps are presented. Finally, strategies for processing almost arbitrarily large images and fusion of disparity images using orthographic projection are proposed. A comparison on standard stereo images shows that SGM is among the currently top-ranked algorithms and is best, if subpixel accuracy is considered. The complexity is linear to the number of pixels and disparity range, which results in a runtime of just 1-2 seconds on typical test images. An in depth evaluation of the Ml-based matching cost demonstrates a tolerance against a wide range of radiometric transformations. Finally, examples of reconstructions from huge aerial frame and pushbroom images demonstrate that the presented ideas are working well on practical problems. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Likelihood Ratio-Based Biometric Score Fusion

    Page(s): 342 - 347
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1742 KB) |  | HTML iconHTML  

    Multibiometric systems fuse information from different sources to compensate for the limitations in performance of individual matchers. We propose a framework for the optimal combination of match scores that is based on the likelihood ratio test. The distributions of genuine and impostor match scores are modeled as finite Gaussian mixture model. The proposed fusion approach is general in its ability to handle 1) discrete values in biometric match score distributions, 2) arbitrary scales and distributions of match scores, 3) correlation between the scores of multiple matchers, and 4) sample quality of multiple biometric sources. Experiments on three multibiometric databases indicate that the proposed fusion framework achieves consistently high performance compared to commonly used score fusion techniques based on score transformation and classification. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • MultiK-MHKS: A Novel Multiple Kernel Learning Algorithm

    Page(s): 348 - 353
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1447 KB) |  | HTML iconHTML  

    In this paper, we develop a new effective multiple kernel learning algorithm. First, we map the input data into m different feature spaces by m empirical kernels, where each generated feature space is taken as one view of the input space. Then, through borrowing the motivating argument from Canonical Correlation Analysis (CCA) that can maximally correlate the m views in the transformed coordinates, we introduce a special term called Inter-Function Similarity Loss RIFSI. into the existing regularization framework so as to guarantee the agreement of multiview outputs. In implementation, we select the Modification of Ho-Kashyap algorithm with Squared approximation of the misclassification errors (MHKS) as the incorporated paradigm and the experimental results on benchmark data sets demonstrate the feasibility and effectiveness of the proposed algorithm named MultiK-MHKS. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Bayesian-Competitive Consistent Labeling for People Surveillance

    Page(s): 354 - 360
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2374 KB) |  | HTML iconHTML  

    This paper presents a novel and robust approach to consistent labeling for people surveillance in multicamera systems. A general framework scalable to any number of cameras with overlapped views is devised. An offline training process automatically computes ground-plane homography and recovers epipolar geometry. When a new object is detected in any one camera, hypotheses for potential matching objects in the other cameras are established. Each of the hypotheses is evaluated using a prior and likelihood value. The prior accounts for the positions of the potential matching objects, while the likelihood is computed by warping the vertical axis of the new object on the field of view of the other cameras and measuring the amount of match. In the likelihood, two contributions (forward and backward) are considered so as to correctly handle the case of groups of people merged into single objects. Eventually, a maximum-a-posteriori approach estimates the best label assignment for the new object. Comparisons with other methods based on homography and extensive outdoor experiments demonstrate that the proposed approach is accurate and robust in coping with segmentation errors and in disambiguating groups. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Trajectory Association across Multiple Airborne Cameras

    Page(s): 361 - 367
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1074 KB) |  | HTML iconHTML  

    A camera mounted on an aerial vehicle provides an excellent means to monitor large areas of a scene. Utilizing several such cameras on different aerial vehicles allows further flexibility in terms of increased visual scope and in the pursuit of multiple targets. In this paper, we address the problem of associating trajectories across multiple moving airborne cameras. We exploit geometric constraints on the relationship between the motion of each object across cameras without assuming any prior calibration information. Since multiple cameras exist, ensuring coherency in association is an essential requirement, e.g., that transitive closure is maintained between more than two cameras. To ensure such coherency, we pose the problem of maximizing the likelihood function as a k-dimensional matching and use an approximation to find the optimal assignment of association. Using the proposed error function, canonical trajectories of each object and optimal estimates of intercamera transformations (in a maximum likelihood sense) are computed. Finally, we show that, as a result of associating trajectories across the cameras, under special conditions, trajectories interrupted due to occlusion or missing detections can be repaired. Results are shown on a number of real and controlled scenarios with multiple objects observed by multiple cameras, validating our qualitative models, and, through simulation, quantitative performance is also reported. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Full text access may be available. Click article title to sign in or learn about subscription options.
  • TPAMI Information for authors

    Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (83 KB)  
    Freely Available from IEEE
  • [Back cover]

    Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (136 KB)  
    Freely Available from IEEE

Aims & Scope

The IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) is published monthly. Its editorial board strives to present most important research results in areas within TPAMI's scope.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David A. Forsyth
University of Illinois