By Topic

Pattern Analysis and Machine Intelligence, IEEE Transactions on

Issue 1 • Date Jan. 2010

Filter Results

Displaying Results 1 - 22 of 22
  • [Front cover]

    Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (178 KB)  
    Freely Available from IEEE
  • [Inside front cover]

    Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (142 KB)  
    Freely Available from IEEE
  • The 30th Anniversary of the IEEE Transactions on Pattern Analysis and Machine Intelligence

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (66 KB)  
    Freely Available from IEEE
  • Accurate Image Search Using the Contextual Dissimilarity Measure

    Page(s): 2 - 11
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2868 KB) |  | HTML iconHTML  

    This paper introduces the contextual dissimilarity measure, which significantly improves the accuracy of bag-of-features-based image search. Our measure takes into account the local distribution of the vectors and iteratively estimates distance update terms in the spirit of Sinkhorn's scaling algorithm, thereby modifying the neighborhood structure. Experimental results show that our approach gives significantly better results than a standard distance and outperforms the state of the art in terms of accuracy on the Nisteacuter-Steweacutenius and Lola data sets. This paper also evaluates the impact of a large number of parameters, including the number of descriptors, the clustering method, the visual vocabulary size, and the distance measure. The optimal parameter choice is shown to be quite context-dependent. In particular, using a large number of descriptors is interesting only when using our dissimilarity measure. We have also evaluated two novel variants: multiple assignment and rank aggregation. They are shown to further improve accuracy at the cost of higher memory usage and lower efficiency. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic Range Image Registration in the Markov Chain

    Page(s): 12 - 29
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4684 KB) |  | HTML iconHTML  

    In this paper, a novel entropy that can describe both long and short-tailed probability distributions of constituents of a thermodynamic system out of its thermodynamic limit is first derived from the Lyapunov function for a Markov chain. We then maximize this entropy for the estimation of the probabilities of possible correspondences established using the traditional closest point criterion between two overlapping range images. When we change our viewpoint to look carefully at the minimum solution to the probability estimate of the correspondences, the iterative range image registration process can also be modeled as a Markov chain in which lessons from past experience in estimating those probabilities are learned. To impose the two-way constraint, outliers are explicitly modeled due to the almost ubiquitous occurrence of occlusion, appearance, and disappearance of points in either image. The estimated probabilities of the correspondences are finally embedded into the powerful mean field annealing scheme for global optimization, leading the camera motion parameters to be estimated in the weighted least-squares sense. A comparative study using real images shows that the proposed algorithm usually outperforms the state-of-the-art ICP variants and the latest genetic algorithm for automatic overlapping range image registration. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Boosting Framework for Visuality-Preserving Distance Metric Learning and Its Application to Medical Image Retrieval

    Page(s): 30 - 44
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2770 KB) |  | HTML iconHTML  

    Similarity measurement is a critical component in content-based image retrieval systems, and learning a good distance metric can significantly improve retrieval performance. However, despite extensive study, there are several major shortcomings with the existing approaches for distance metric learning that can significantly affect their application to medical image retrieval. In particular, ldquosimilarityrdquo can mean very different things in image retrieval: resemblance in visual appearance (e.g., two images that look like one another) or similarity in semantic annotation (e.g., two images of tumors that look quite different yet are both malignant). Current approaches for distance metric learning typically address only one goal without consideration of the other. This is problematic for medical image retrieval where the goal is to assist doctors in decision making. In these applications, given a query image, the goal is to retrieve similar images from a reference library whose semantic annotations could provide the medical professional with greater insight into the possible interpretations of the query image. If the system were to retrieve images that did not look like the query, then users would be less likely to trust the system; on the other hand, retrieving images that appear superficially similar to the query but are semantically unrelated is undesirable because that could lead users toward an incorrect diagnosis. Hence, learning a distance metric that preserves both visual resemblance and semantic similarity is important. We emphasize that, although our study is focused on medical image retrieval, the problem addressed in this work is critical to many image retrieval systems. We present a boosting framework for distance metric learning that aims to preserve both visual and semantic similarities. The boosting framework first learns a binary representation using side information, in the form of labeled pairs, and then computes the distance as a weighted Hammi- - ng distance using the learned binary representation. A boosting algorithm is presented to efficiently learn the distance function. We evaluate the proposed algorithm on a mammographic image reference library with an interactive search-assisted decision support (ISADS) system and on the medical image data set from ImageCLEF. Our results show that the boosting framework compares favorably to state-of-the-art approaches for distance metric learning in retrieval accuracy, with much lower computational cost. Additional evaluation with the COREL collection shows that our algorithm works well for regular image data sets. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Convex and Semi-Nonnegative Matrix Factorizations

    Page(s): 45 - 55
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (879 KB) |  | HTML iconHTML  

    We present several new variations on the theme of nonnegative matrix factorization (NMF). Considering factorizations of the form X = FGT, we focus on algorithms in which G is restricted to containing nonnegative entries, but allowing the data matrix X to have mixed signs, thus extending the applicable range of NMF methods. We also consider algorithms in which the basis vectors of F are constrained to be convex combinations of the data points. This is used for a kernel extension of NMF. We provide algorithms for computing these new factorizations and we provide supporting theoretical analysis. We also analyze the relationships between our algorithms and clustering algorithms, and consider the implications for sparseness of solutions. Finally, we present experimental results that explore the properties of these new methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Correspondence-Free Activity Analysis and Scene Modeling in Multiple Camera Views

    Page(s): 56 - 71
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4899 KB) |  | HTML iconHTML  

    We propose a novel approach for activity analysis in multiple synchronized but uncalibrated static camera views. In this paper, we refer to activities as motion patterns of objects, which correspond to paths in far-field scenes. We assume that the topology of cameras is unknown and quite arbitrary, the fields of views covered by these cameras may have no overlap or any amount of overlap, and objects may move on different ground planes. Using low-level cues, objects are first tracked in each camera view independently, and the positions and velocities of objects along trajectories are computed as features. Under a probabilistic model, our approach jointly learns the distribution of an activity in the feature spaces of different camera views. Then, it accomplishes the following tasks: 1) grouping trajectories, which belong to the same activity but may be in different camera views, into one cluster; 2) modeling paths commonly taken by objects across multiple camera views; and 3) detecting abnormal activities. Advantages of this approach are that it does not require first solving the challenging correspondence problem, and that learning is unsupervised. Even though correspondence is not a prerequisite, after the models of activities have been learned, they can help to solve the correspondence problem, since if two trajectories in different camera views belong to the same activity, they are likely to correspond to the same object. Our approach is evaluated on a simulated data set and two very large real data sets, which have 22,951 and 14,985 trajectories, respectively. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Differential Geometric Inference in Surface Stereo

    Page(s): 72 - 86
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3336 KB) |  | HTML iconHTML  

    Many traditional two-view stereo algorithms explicitly or implicitly use the frontal parallel plane assumption when exploiting contextual information since, e.g., the smoothness prior biases toward constant disparity (depth) over a neighborhood. This introduces systematic errors to the matching process for slanted or curved surfaces. These errors are nonnegligible for detailed geometric modeling of natural objects such as a human face. We show how to use contextual information geometrically to avoid such errors. A differential geometric study of smooth surfaces allows contextual information to be encoded in Cartan's moving frame model over local quadratic approximations, providing a framework of geometric consistency for both depth and surface normals; the accuracy of our reconstructions argues for the sufficiency of the approximation. In effect, Cartan's model provides the additional constraint necessary to move beyond the frontal parallel plane assumption in stereo reconstruction. It also suggests how geometry can extend surfaces to account for unmatched points due to partial occlusion. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Direct Estimation of Nonrigid Registrations with Image-Based Self-Occlusion Reasoning

    Page(s): 87 - 104
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (6153 KB) |  | HTML iconHTML  

    The registration problem for images of a deforming surface has been well studied. External occlusions are usually well handled. In 2D image-based registration, self-occlusions are more challenging. Consequently, the surface is usually assumed to be only slightly self-occluding. This paper is about image-based nonrigid registration with self-occlusion reasoning. A specific framework explicitly modeling self-occlusions is proposed. It is combined with an intensity-based, ldquodirectrdquo data term for registration. Self-occlusions are detected as shrinkage areas in the 2D warp. Experimental results on several challenging data sets show that our approach successfully registers images with self-occlusions while effectively detecting the self-occluded regions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Faster and Better: A Machine Learning Approach to Corner Detection

    Page(s): 105 - 119
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2998 KB)  

    The repeatability and efficiency of a corner detector determines how likely it is to be useful in a real-world application. The repeatability is important because the same scene viewed from different positions should yield features which correspond to the same real-world 3D locations. The efficiency is important because this determines whether the detector combined with further processing can operate at frame rate. Three advances are described in this paper. First, we present a new heuristic for feature detection and, using machine learning, we derive a feature detector from this which can fully process live PAL video using less than 5 percent of the available processing time. By comparison, most other detectors cannot even operate at frame rate (Harris detector 115 percent, SIFT 195 percent). Second, we generalize the detector, allowing it to be optimized for repeatability, with little loss of efficiency. Third, we carry out a rigorous comparison of corner detectors based on the above repeatability criterion applied to 3D scenes. We show that, despite being principally constructed for speed, on these stringent tests, our heuristic detector significantly outperforms existing feature detectors. Finally, the comparison demonstrates that using machine learning produces significant improvements in repeatability, yielding a detector that is both very fast and of very high quality. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On the Decoding Process in Ternary Error-Correcting Output Codes

    Page(s): 120 - 134
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3288 KB)  

    A common way to model multiclass classification problems is to design a set of binary classifiers and to combine them. Error-correcting output codes (ECOC) represent a successful framework to deal with these type of problems. Recent works in the ECOC framework showed significant performance improvements by means of new problem-dependent designs based on the ternary ECOC framework. The ternary framework contains a larger set of binary problems because of the use of a ldquodo not carerdquo symbol that allows us to ignore some classes by a given classifier. However, there are no proper studies that analyze the effect of the new symbol at the decoding step. In this paper, we present a taxonomy that embeds all binary and ternary ECOC decoding strategies into four groups. We show that the zero symbol introduces two kinds of biases that require redefinition of the decoding design. A new type of decoding measure is proposed, and two novel decoding strategies are defined. We evaluate the state-of-the-art coding and decoding strategies over a set of UCI machine learning repository data sets and into a real traffic sign categorization problem. The experimental results show that, following the new decoding strategies, the performance of the ECOC design is significantly improved. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Structural Approach for Building Reconstruction from a Single DSM

    Page(s): 135 - 147
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5110 KB)  

    We present a new approach for building reconstruction from a single Digital Surface Model (DSM). It treats buildings as an assemblage of simple urban structures extracted from a library of 3D parametric blocks (like a LEGO set). First, the 2D-supports of the urban structures are extracted either interactively or automatically. Then, 3D-blocks are placed on the 2D-supports using a Gibbs model which controls both the block assemblage and the fitting to data. A Bayesian decision finds the optimal configuration of 3D--blocks using a Markov Chain Monte Carlo sampler associated with original proposition kernels. This method has been validated on multiple data set in a wide-resolution interval such as 0.7 m satellite and 0.1 m aerial DSMs, and provides 3D representations on complex buildings and dense urban areas with various levels of detail. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Using Language to Learn Structured Appearance Models for Image Annotation

    Page(s): 148 - 164
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3219 KB)  

    Given an unstructured collection of captioned images of cluttered scenes featuring a variety of objects, our goal is to simultaneously learn the names and appearances of the objects. Only a small fraction of local features within any given image are associated with a particular caption word, and captions may contain irrelevant words not associated with any image object. We propose a novel algorithm that uses the repetition of feature neighborhoods across training images and a measure of correspondence with caption words to learn meaningful feature configurations (representing named objects). We also introduce a graph-based appearance model that captures some of the structure of an object by encoding the spatial relationships among the local visual features. In an iterative procedure, we use language (the words) to drive a perceptual grouping process that assembles an appearance model for a named object. Results of applying our method to three data sets in a variety of conditions demonstrate that, from complex, cluttered, real-world scenes with noisy captions, we can learn both the names and appearances of objects, resulting in a set of models invariant to translation, scale, orientation, occlusion, and minor changes in viewpoint or articulation. These named models, in turn, are used to automatically annotate new, uncaptioned images, thereby facilitating keyword-based image retrieval. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast Algorithm for Walsh Hadamard Transform on Sliding Windows

    Page(s): 165 - 171
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2600 KB)  

    This paper proposes a fast algorithm for Walsh Hadamard Transform on sliding windows which can be used to implement pattern matching most efficiently. The computational requirement of the proposed algorithm is about 1.5 additions per projection vector per sample, which is the lowest among existing fast algorithms for Walsh Hadamard Transform on sliding windows. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Spatiotemporal Saliency in Dynamic Scenes

    Page(s): 171 - 177
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1669 KB)  

    A spatiotemporal saliency algorithm based on a center-surround framework is proposed. The algorithm is inspired by biological mechanisms of motion-based perceptual grouping and extends a discriminant formulation of center-surround saliency previously proposed for static imagery. Under this formulation, the saliency of a location is equated to the power of a predefined set of features to discriminate between the visual stimuli in a center and a surround window, centered at that location. The features are spatiotemporal video patches and are modeled as dynamic textures, to achieve a principled joint characterization of the spatial and temporal components of saliency. The combination of discriminant center-surround saliency with the modeling power of dynamic textures yields a robust, versatile, and fully unsupervised spatiotemporal saliency algorithm, applicable to scenes with highly dynamic backgrounds and moving cameras. The related problem of background subtraction is treated as the complement of saliency detection, by classifying nonsalient (with respect to appearance and motion dynamics) points in the visual field as background. The algorithm is tested for background subtraction on challenging sequences, and shown to substantially outperform various state-of-the-art techniques. Quantitatively, its average error rate is almost half that of the closest competitor. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Generalized Kernel Consensus-Based Robust Estimator

    Page(s): 178 - 184
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1929 KB)  

    In this paper, we present a new adaptive-scale kernel consensus (ASKC) robust estimator as a generalization of the popular and state-of-the-art robust estimators such as random sample consensus (RANSAC), adaptive scale sample consensus (ASSC), and maximum kernel density estimator (MKDE). The ASKC framework is grounded on and unifies these robust estimators using nonparametric kernel density estimation theory. In particular, we show that each of these methods is a special case of ASKC using a specific kernel. Like these methods, ASKC can tolerate more than 50 percent outliers, but it can also automatically estimate the scale of inliers. We apply ASKC to two important areas in computer vision, robust motion estimation and pose estimation, and show comparative results on both synthetic and real data. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 2009 Reviewers List

    Page(s): 185 - 190
    Save to Project icon | Request Permissions | PDF file iconPDF (96 KB)  
    Freely Available from IEEE
  • Call for Papers for Special Issue on Real-World Face Recognition

    Page(s): 191
    Save to Project icon | Request Permissions | PDF file iconPDF (43 KB)  
    Freely Available from IEEE
  • Call for Papers for New IEEE Transactions on Affective Computing

    Page(s): 192
    Save to Project icon | Request Permissions | PDF file iconPDF (145 KB)  
    Freely Available from IEEE
  • TPAMI Information for authors

    Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (142 KB)  
    Freely Available from IEEE
  • [Back cover]

    Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (178 KB)  
    Freely Available from IEEE

Aims & Scope

The IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) is published monthly. Its editorial board strives to present most important research results in areas within TPAMI's scope.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David A. Forsyth
University of Illinois