By Topic

Pattern Analysis and Machine Intelligence, IEEE Transactions on

Issue 5 • Date May 2008

Filter Results

Displaying Results 1 - 19 of 19
  • [Front cover]

    Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (149 KB)  
    Freely Available from IEEE
  • [Inside front cover]

    Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (80 KB)  
    Freely Available from IEEE
  • A Comparative Study of Staff Removal Algorithms

    Page(s): 753 - 766
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2751 KB) |  | HTML iconHTML  

    This paper presents a quantitative comparison of different algorithms for the removal of stafflines from music images. It contains a survey of previously proposed algorithms and suggests a new skeletonization-based approach. We define three different error metrics, compare the algorithms with respect to these metrics, and measure their robustness with respect to certain image defects. Our test images are computer-generated scores on which we apply various image deformations typically found in real-world data. In addition to modern western music notation, our test set also includes historic music notation such as mensural notation and lute tablature. Our general approach and evaluation methodology is not specific to staff removal but applicable to other segmentation problems as well. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Markov Random Field-Based Statistical Character Structure Modeling for Handwritten Chinese Character Recognition

    Page(s): 767 - 780
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1952 KB) |  | HTML iconHTML  

    This paper proposes a statistical-structural character modeling method based on Markov random fields (MRFs) for handwritten Chinese character recognition (HCCR). The stroke relationships of a Chinese character reflect its structure, which can be statistically represented by the neighborhood system and clique potentials within the MRF framework. Based on the prior knowledge of character structures, we design the neighborhood system that accounts for the most important stroke relationships. We penalize the structurally mismatched stroke relationships with MRFs using the prior clique potentials and derive the likelihood clique potentials from Gaussian mixture models, which encode the large variations of stroke relationships statistically. In the proposed HCCR system, we use the single-site likelihood clique potentials to extract many candidate strokes from character images and use the pair-site clique potentials to determine the best structural match between the input candidate strokes and the MRF-based character models by relaxation labeling. The experiments on the Korea Advanced Institute of Science and Technology (KAIST) character database demonstrate that MRFs can statistically model character structures, and work well in the HCCR system. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Superquadric Segmentation in Range Images via Fusion of Region and Boundary Information

    Page(s): 781 - 795
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5714 KB) |  | HTML iconHTML  

    The high potential of superquadrics as modeling elements for image segmentation tasks has been pointed out for years in the computer vision community. In this work, we employ superquadrics as modeling elements for multiple object segmentation in range images. Segmentation is executed in two stages: First, a hypothesis about the values of the segmentation parameters is generated. Second, the hypothesis is refined locally. In both stages, object boundary and region information are considered. Boundary information is derived via model-based edge detection in the input range image. Hypothesis generation uses boundary information to isolate image regions that can be accurately described by superquadrics. Within hypothesis refinement, a game-theoretic framework is used to fuse the two information sources by associating an objective function to each information source. Iterative optimization of the two objective functions in succession, outputs a precise description of all image objects. We demonstrate experimentally that this approach substantially improves the most established method in superquadric segmentation in terms of accuracy and computational efficiency. We demonstrate the applicability of our segmentation framework in real-world applications by constructing a novel robotic system for automatic unloading of jumbled box-like objects from platforms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Riemannian Manifold Learning

    Page(s): 796 - 809
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2880 KB) |  | HTML iconHTML  

    Recently, manifold learning has been widely exploited in pattern recognition, data analysis, and machine learning. This paper presents a novel framework, called Riemannian manifold learning (RML), based on the assumption that the input high-dimensional data lie on an intrinsically low-dimensional Riemannian manifold. The main idea is to formulate the dimensionality reduction problem as a classical problem in Riemannian geometry, that is, how to construct coordinate charts for a given Riemannian manifold? We implement the Riemannian normal coordinate chart, which has been the most widely used in Riemannian geometry, for a set of unorganized data points. First, two input parameters (the neighborhood size k and the intrinsic dimension d) are estimated based on an efficient simplicial reconstruction of the underlying manifold. Then, the normal coordinates are computed to map the input high-dimensional data into a low- dimensional space. Experiments on synthetic data, as well as real-world images, demonstrate that our algorithm can learn intrinsic geometric structures of the data, preserve radial geodesic distances, and yield regular embeddings. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient Multiclass ROC Approximation by Decomposition via Confusion Matrix Perturbation Analysis

    Page(s): 810 - 822
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1843 KB) |  | HTML iconHTML  

    Receiver operator characteristic (ROC) analysis has become a standard tool in the design and evaluation of two-class classification problems. It allows for an analysis that incorporates all possible priors, costs, and operating points, which is important in many real problems, where conditions are often nonideal. Extending this to the multiclass case is attractive, conferring the benefits of ROC analysis to a multitude of new problems. Even though the ROC analysis extends theoretically to the multiclass case, the exponential computational complexity as a function of the number of classes is restrictive. In this paper, we show that the multiclass ROC can often be simplified considerably because some ROC dimensions are independent of each other. We present an algorithm that analyzes interactions between various ROC dimensions, identifying independent classes, and groups of interacting classes, allowing the ROC to be decomposed. The resulting decomposed ROC hypersurface can be interrogated in a similar fashion to the ideal case, allowing for approaches such as cost-sensitive and Neyman-Pearson optimization, as well as the volume under the ROC. An extensive bouquet of examples and experiments demonstrates the potential of this methodology. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Theoretical Foundations of Spatially-Variant Mathematical Morphology Part I: Binary Images

    Page(s): 823 - 836
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2217 KB) |  | HTML iconHTML  

    We develop a general theory of spatially-variant (SV) mathematical morphology for binary images in the euclidean space. The basic SV morphological operators (that is, SV erosion, SV dilation, SV opening, and SV closing) are defined. We demonstrate the ubiquity of SV morphological operators by providing an SV kernel representation of increasing operators. The latter representation is a generalization of Matheron's representation theorem of increasing and translation-invariant operators. The SV kernel representation is redundant, in the sense that a smaller subset of the SV kernel is sufficient for the representation of increasing operators. We provide sufficient conditions for the existence of the basis representation in terms of upper-semicontinuity in the hit-or-miss topology. The latter basis representation is a generalization of Maragos' basis representation for increasing and translation-invariant operators. Moreover, we investigate the upper-semicontinuity property of the basic SV morphological operators. Several examples are used to demonstrate that the theory of spatially-variant mathematical morphology provides a general framework for the unification of various morphological schemes based on spatially-variant geometrical structuring elements (for example, circular, affine, and motion morphology). Simulation results illustrate the theory of the proposed spatially-variant morphological framework and show its potential power in various image processing applications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Theoretical Foundations of Spatially-Variant Mathematical Morphology Part II: Gray-Level Images

    Page(s): 837 - 850
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (6564 KB) |  | HTML iconHTML  

    In this paper, we develop a spatially-variant (SV) mathematical morphology theory for gray-level signals and images in the Euclidean space. The proposed theory preserves the geometrical concept of the structuring function, which provides the foundation of classical morphology and is essential in signal and image processing applications. We define the basic SV gray-level morphological operators (that is, SV gray-level erosion, dilation, opening, and closing) and investigate their properties. We demonstrate the ubiquity of SV gray-level morphological systems by deriving a kernel representation for a large class of systems, called V-systems, in terms of the basic SV gray-level morphological operators. A V-system is defined to be a gray-level operator, which is invariant under gray-level (vertical) translations. Particular attention is focused on the class of SV flat gray-level operators. The kernel representation for increasing V-systems is a generalization of Maragos' kernel representation for increasing and translation-invariant function-processing systems. A representation of V-systems in terms of their kernel elements is established for increasing and upper semicontinuous V-systems. This representation unifies a large class of spatially-variant-linear and nonlinear systems under the same mathematical framework. The theory is used for analyzing special cases of signal and image processing systems such as SV order rank filters and ' linear-time-varying systems. Finally, simulation results show the potential power of the general theory of gray-level SV mathematical morphology in several image analysis and computer vision applications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Coarse-to-Fine Segmentation and Tracking Using Sobolev Active Contours

    Page(s): 851 - 864
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (10028 KB) |  | HTML iconHTML  

    Recently proposed Sobolev active contours introduced a new paradigm for minimizing energies defined on curves by changing the traditional cost of perturbing a curve and thereby redefining gradients associated to these energies. Sobolev active contours evolve more globally and are less attracted to certain intermediate local minima than traditional active contours, and it is based on a well- structured Riemannian metric, which is important for shape analysis and shape priors. In this paper, we analyze Sobolev active contours using scale-space analysis in order to understand their evolution across different scales. This analysis shows an extremely important and useful behavior of Sobolev contours, namely, that they move successively from coarse to increasingly finer scale motions in a continuous manner. This property illustrates that one justification for using the Sobolev technique is for applications where coarse-scale deformations are preferred over fine-scale deformations. Along with other properties to be discussed, the coarse-to-fine observation reveals that Sobolev active contours are, in particular, ideally suited for tracking algorithms that use active contours. We will also justify our assertion that the Sobolev metric should be used over the traditional metric for active contours in tracking problems by experimentally showing how a variety of active-contour-based tracking methods can be significantly improved merely by evolving the active contour according to the Sobolev method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Factorization-Based Approach for Articulated Nonrigid Shape, Motion and Kinematic Chain Recovery From Video

    Page(s): 865 - 877
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4863 KB) |  | HTML iconHTML  

    Recovering articulated shape and motion, especially human body motion, from video is a challenging problem with a wide range of applications in medical study, sport analysis, animation, and so forth. Previous work on articulated motion recovery generally requires prior knowledge of the kinematic chain and usually does not concern the recovery of the articulated shape. The nonrigidity of some articulated part, for example, human body motion with nonrigid facial motion, is completely ignored. We propose a factorization-based approach to recover the shape, motion, and kinematic chain of an articulated object with nonrigid parts altogether directly from video sequences under a unified framework. The proposed approach is based on our modeling of the articulated nonrigid motion as a set of intersecting motion subspaces. A motion subspace is the linear subspace of the trajectories of an object. It can model a rigid or nonrigid motion. The intersection of two motion subspaces of linked parts models the motion of an articulated joint or axis. Our approach consists of algorithms for motion segmentation, kinematic chain building, and shape recovery. It handles outliers and can be automated. We test our approach through synthetic and real experiments and demonstrate how to recover an articulated structure with nonrigid parts via a single-view camera without prior knowledge of its kinematic chain. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Nonrigid Structure-from-Motion: Estimating Shape and Motion with Hierarchical Priors

    Page(s): 878 - 892
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2157 KB) |  | HTML iconHTML  

    This paper describes methods for recovering time-varying shape and motion of nonrigid 3D objects from uncalibrated 2D point tracks. For example, given a video recording of a talking person, we would like to estimate the 3D shape of the face at each instant and learn a model of facial deformation. Time-varying shape is modeled as a rigid transformation combined with a nonrigid deformation. Reconstruction is ill-posed if arbitrary deformations are allowed, and thus additional assumptions about deformations are required. We first suggest restricting shapes to lie within a low-dimensional subspace and describe estimation algorithms. However, this restriction alone is insufficient to constrain reconstruction. To address these problems, we propose a reconstruction method using a Probabilistic Principal Components Analysis (PPCA) shape model and an estimation algorithm that simultaneously estimates 3D shape and motion for each instant, learns the PPCA model parameters, and robustly fills-in missing data points. We then extend the model to represent temporal dynamics in object shape, allowing the algorithm to robustly handle severe cases of missing data. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Video Behavior Profiling for Anomaly Detection

    Page(s): 893 - 908
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (10632 KB) |  | HTML iconHTML  

    This paper aims to address the problem of modeling video behavior captured in surveillance videos for the applications of online normal behavior recognition and anomaly detection. A novel framework is developed for automatic behavior profiling and online anomaly sampling/detection without any manual labeling of the training data set. The framework consists of the following key components: 1) A compact and effective behavior representation method is developed based on discrete-scene event detection. The similarity between behavior patterns are measured based on modeling each pattern using a Dynamic Bayesian Network (DBN). 2) The natural grouping of behavior patterns is discovered through a novel spectral clustering algorithm with unsupervised model selection and feature selection on the eigenvectors of a normalized affinity matrix. 3) A composite generative behavior model is constructed that is capable of generalizing from a small training set to accommodate variations in unseen normal behavior patterns. 4) A runtime accumulative anomaly measure is introduced to detect abnormal behavior, whereas normal behavior patterns are recognized when sufficient visual evidence has become available based on an online Likelihood Ratio Test (LRT) method. This ensures robust and reliable anomaly detection and normal behavior recognition at the shortest possible time. The effectiveness and robustness of our approach is demonstrated through experiments using noisy and sparse data sets collected from both indoor and outdoor surveillance scenarios. In particular, it is shown that a behavior model trained using an unlabeled data set is superior to those trained using the same but labeled data set in detecting anomaly from an unseen video. The experiments also suggest that our online LRT-based behavior recognition approach is advantageous over the commonly used Maximum Likelihood (ML) method in differentiating ambiguities among different behavior classes observed online. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Modeling, Clustering, and Segmenting Video with Mixtures of Dynamic Textures

    Page(s): 909 - 926
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (18297 KB) |  | HTML iconHTML  

    A dynamic texture is a spatio-temporal generative model for video, which represents video sequences as observations from a linear dynamical system. This work studies the mixture of dynamic textures, a statistical model for an ensemble of video sequences that is sampled from a finite collection of visual processes, each of which is a dynamic texture. An expectation-maximization (EM) algorithm is derived for learning the parameters of the model, and the model is related to previous works in linear systems, machine learning, time- series clustering, control theory, and computer vision. Through experimentation, it is shown that the mixture of dynamic textures is a suitable representation for both the appearance and dynamics of a variety of visual processes that have traditionally been challenging for computer vision (for example, fire, steam, water, vehicle and pedestrian traffic, and so forth). When compared with state-of-the-art methods in motion segmentation, including both temporal texture methods and traditional representations (for example, optical flow or other localized motion representations), the mixture of dynamic textures achieves superior performance in the problems of clustering and segmenting video of such processes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • In this issue - Technically

    Page(s): 927
    Save to Project icon | Request Permissions | PDF file iconPDF (29 KB)  
    Freely Available from IEEE
  • Join the IEEE Computer Society [advertisement]

    Save to Project icon | Request Permissions | PDF file iconPDF (68 KB)  
    Freely Available from IEEE
  • Correction to "MAC: Magnetostatic Active Contour Model" [Apr 08 632-646]

    Page(s): Online Only
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (32 KB)  

    In the above titled paper (ibid., vol. 30, no. 4, pp. 632-646, Apr 08), there was an error in a definition. The correct definition is presented here. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • TPAMI Information for authors

    Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (80 KB)  
    Freely Available from IEEE
  • [Back cover]

    Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (149 KB)  
    Freely Available from IEEE

Aims & Scope

The IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) is published monthly. Its editorial board strives to present most important research results in areas within TPAMI's scope.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David A. Forsyth
University of Illinois