By Topic

Pattern Analysis and Machine Intelligence, IEEE Transactions on

Issue 5 • Date May 2005

Filter Results

Displaying Results 1 - 21 of 21
  • [Front cover]

    Publication Year: 2005 , Page(s): c1
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | PDF file iconPDF (151 KB)  
    Freely Available from IEEE
  • [Inside front cover]

    Publication Year: 2005 , Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (771 KB)  
    Freely Available from IEEE
  • Automated variable weighting in k-means type clustering

    Publication Year: 2005 , Page(s): 657 - 668
    Cited by:  Papers (105)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1326 KB) |  | HTML iconHTML  

    This paper proposes a k-means type clustering algorithm that can automatically calculate variable weights. A new step is introduced to the k-means clustering process to iteratively update variable weights based on the current partition of data and a formula for weight calculation is proposed. The convergency theorem of the new clustering process is given. The variable weights produced by the algorithm measure the importance of variables in clustering and can be used in variable selection in data mining applications where large and complex real data are often involved. Experimental results on both synthetic and real data have shown that the new algorithm outperformed the standard k-means type algorithms in recovering clusters in data. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An integration of online and pseudo-online information for cursive word recognition

    Publication Year: 2005 , Page(s): 669 - 683
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1511 KB) |  | HTML iconHTML  

    In this paper, we present a novel method to extract stroke order independent information from online data. This information, which we term pseudo-online, conveys relevant information on the offline representation of the word. Based on this information, a combination of classification decisions from online and pseudo-online cursive word recognizers is performed to improve the recognition of online cursive words. One of the most valuable aspects of this approach with respect to similar methods that combine online and offline classifiers for word recognition is that the pseudo-online representation is similar to the online signal and, hence, word recognition is based on a single engine. Results demonstrate that the pseudo-online representation is useful as the combination of classifiers perform better than those based solely on pure online information. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Acquiring linear subspaces for face recognition under variable lighting

    Publication Year: 2005 , Page(s): 684 - 698
    Cited by:  Papers (336)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1155 KB) |  | HTML iconHTML  

    Previous work has demonstrated that the image variation of many objects (human faces in particular) under variable lighting can be effectively modeled by low-dimensional linear spaces, even when there are multiple light sources and shadowing. Basis images spanning this space are usually obtained in one of three ways: a large set of images of the object under different lighting conditions is acquired, and principal component analysis (PCA) is used to estimate a subspace. Alternatively, synthetic images are rendered from a 3D model (perhaps reconstructed from images) under point sources and, again, PCA is used to estimate a subspace. Finally, images rendered from a 3D model under diffuse lighting based on spherical harmonics are directly used as basis images. In this paper, we show how to arrange physical lighting so that the acquired images of each object can be directly used as the basis vectors of a low-dimensional linear space and that this subspace is close to those acquired by the other methods. More specifically, there exist configurations of k point light source directions, with k typically ranging from 5 to 9, such that, by taking k images of an object under these single sources, the resulting subspace is an effective representation for recognition under a wide range of lighting conditions. Since the subspace is generated directly from real images, potentially complex and/or brittle intermediate steps such as 3D reconstruction can be completely avoided; nor is it necessary to acquire large numbers of training images or to physically construct complex diffuse (harmonic) light fields. We validate the use of subspaces constructed in this fashion within the context of face recognition. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Active and dynamic information fusion for facial expression understanding from image sequences

    Publication Year: 2005 , Page(s): 699 - 714
    Cited by:  Papers (110)  |  Patents (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2024 KB) |  | HTML iconHTML  

    This paper explores the use of multisensory information fusion technique with dynamic Bayesian networks (DBN) for modeling and understanding the temporal behaviors of facial expressions in image sequences. Our facial feature detection and tracking based on active IR illumination provides reliable visual information under variable lighting and head motion. Our approach to facial expression recognition lies in the proposed dynamic and probabilistic framework based on combining DBN with Ekman's facial action coding system (FACS) for systematically modeling the dynamic and stochastic behaviors of spontaneous facial expressions. The framework not only provides a coherent and unified hierarchical probabilistic framework to represent spatial and temporal information related to facial expressions, but also allows us to actively select the most informative visual cues from the available information sources to minimize the ambiguity in recognition. The recognition of facial expressions is accomplished by fusing not only from the current visual observations, but also from the previous visual evidences. Consequently, the recognition becomes more robust and accurate through explicitly modeling temporal behavior of facial expression. In this paper, we present the theoretical foundation underlying the proposed probabilistic and dynamic framework for facial expression modeling and understanding. Experimental results demonstrate that our approach can accurately and robustly recognize spontaneous facial expressions from an image sequence under different conditions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic image orientation detection via confidence-based integration of low-level and semantic cues

    Publication Year: 2005 , Page(s): 715 - 726
    Cited by:  Papers (17)  |  Patents (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1471 KB) |  | HTML iconHTML  

    Automatic image orientation detection for natural images is a useful, yet challenging research topic. Humans use scene context and semantic object recognition to identify the correct image orientation. However, it is difficult for a computer to perform the task in the same way because current object recognition algorithms are extremely limited in their scope and robustness. As a result, existing orientation detection methods were built upon low-level vision features such as spatial distributions of color and texture. Discrepant detection rates have been reported for these methods in the literature. We have developed a probabilistic approach to image orientation detection via confidence-based integration of low-level and semantic cues within a Bayesian framework. Our current accuracy is 90 percent for unconstrained consumer photos, impressive given the findings of a psychophysical study conducted recently. The proposed framework is an attempt to bridge the gap between computer and human vision systems and is applicable to other problems involving semantic scene content understanding. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Incremental model-based estimation using geometric constraints

    Publication Year: 2005 , Page(s): 727 - 738
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1635 KB) |  | HTML iconHTML  

    We present a model-based framework for incremental, adaptive object shape estimation and tracking in monocular image sequences. Parametric structure and motion estimation methods usually assume a fixed class of shape representation (splines, deformable superquadrics, etc.) that is initialized prior to tracking. Since the model shape coverage is fixed a priori, the incremental recovery of structure is decoupled from tracking, thereby limiting both processes in their scope and robustness. In this work, we describe a model-based framework that supports the automatic detection and integration of low-level geometric primitives (lines) incrementally. Such primitives are not explicitly captured in the initial model, but are moving consistently with its image motion. The consistency tests used to identify new structure are based on trinocular constraints between geometric primitives. The method allows not only an increase in the model scope, but also improves tracking accuracy by including the newly recovered features in its state estimation. The formulation is a step toward automatic model building, since it allows both weaker assumptions on the availability of a prior shape representation and on the number of features that would otherwise be necessary for entirely bottom-up reconstruction. We demonstrate the proposed approach on two separate image-based tracking domains, each involving complex 3D object structure and motion. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A voting-based computational framework for visual motion analysis and interpretation

    Publication Year: 2005 , Page(s): 739 - 752
    Cited by:  Papers (13)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1469 KB) |  | HTML iconHTML  

    Most approaches for motion analysis and interpretation rely on restrictive parametric models and involve iterative methods which depend heavily on initial conditions and are subject to instability. Further difficulties are encountered in image regions where motion is not smooth-typically around motion boundaries. This work addresses the problem of visual motion analysis and interpretation by formulating it as an inference of motion layers from a noisy and possibly sparse point set in a 4D space. The core of the method is based on a layered 4D representation of data and a voting scheme for affinity propagation. The inherent problem caused by the ambiguity of 2D to 3D interpretation is usually handled by adding additional constraints, such as rigidity. However, enforcing such a global constraint has been problematic in the combined presence of noise and multiple independent motions. By decoupling the processes of matching, outlier rejection, segmentation, and interpretation, we extract accurate motion layers based on the smoothness of image motion, and then locally enforce rigidity for each layer in order to infer its 3D structure and motion. The proposed framework is noniterative and consistently handles both smooth moving regions and motion discontinuities without using any prior knowledge of the motion model. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Learning viewpoint invariant perceptual representations from cluttered images

    Publication Year: 2005 , Page(s): 753 - 761
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (727 KB) |  | HTML iconHTML  

    In order to perform object recognition, it is necessary to form perceptual representations that are sufficiently specific to distinguish between objects, but that are also sufficiently flexible to generalize across changes in location, rotation, and scale. A standard method for learning perceptual representations that are invariant to viewpoint is to form temporal associations across image sequences showing object transformations. However, this method requires that individual stimuli be presented in isolation and is therefore unlikely to succeed in real-world applications where multiple objects can co-occur in the visual input. This paper proposes a simple modification to the learning method that can overcome this limitation and results in more robust learning of invariant representations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Precision range image registration using a robust surface interpenetration measure and enhanced genetic algorithms

    Publication Year: 2005 , Page(s): 762 - 776
    Cited by:  Papers (43)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2934 KB) |  | HTML iconHTML  

    This paper addresses the range image registration problem for views having low overlap and which may include substantial noise. The current state of the art in range image registration is best represented by the well-known iterative closest point (ICP) algorithm and numerous variations on it. Although this method is effective in many domains, it nevertheless suffers from two key limitations: it requires prealignment of the range surfaces to a reasonable starting point; and it is not robust to outliers arising either from noise or low surface overlap. This paper proposes a new approach that avoids these problems. To that end, there are two key, novel contributions in this work: a new, hybrid genetic algorithm (GA) technique, including hill climbing and parallel-migration, combined with a new, robust evaluation metric based on surface interpenetration. Up to now, interpenetration has been evaluated only qualitatively; we define the first quantitative measure for it. Because they search in a space of transformations, GA are capable of registering surfaces even when there is low overlap between them and without need for prealignment. The novel GA search algorithm we present offers much faster convergence than prior GA methods, while the new robust evaluation metric ensures more precise alignments, even in the presence of significant noise, than mean squared error or other well-known robust cost functions. The paper presents thorough experimental results to show the improvements realized by these two contributions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A parallel-line detection algorithm based on HMM decoding

    Publication Year: 2005 , Page(s): 777 - 792
    Cited by:  Papers (14)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1604 KB) |  | HTML iconHTML  

    The detection of groups of parallel lines is important in applications such as form processing and text (handwriting) extraction from rule lined paper. These tasks can be very challenging in degraded documents where the lines are severely broken. In this paper, we propose a novel model-based method which incorporates high-level context to detect these lines. After preprocessing (such as skew correction and text filtering), we use trained hidden Markov models (HMM) to locate the optimal positions of all lines simultaneously on the horizontal or vertical projection profiles, based on the Viterbi decoding. The algorithm is trainable so it can be easily adapted to different application scenarios. The experiments conducted on known form processing and rule line detection show our method is robust, and achieves better results than other widely used line detection methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multiregion level-set partitioning of synthetic aperture radar images

    Publication Year: 2005 , Page(s): 793 - 800
    Cited by:  Papers (69)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1402 KB) |  | HTML iconHTML  

    The purpose of this study is to investigate synthetic aperture radar (SAR) image segmentation into a given but arbitrary number of gamma homogeneous regions via active contours and level sets. The segmentation of SAR images is a difficult problem due to the presence of speckle which can be modeled as strong, multiplicative noise. The proposed algorithm consists of evolving simple closed planar curves within an explicit correspondence between the interiors of curves and regions of segmentation to minimize a criterion containing a term of conformity of data to a speckle model of noise and a term of regularization. Results are shown on both synthetic and real images. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A novel kernel method for clustering

    Publication Year: 2005 , Page(s): 801 - 805
    Cited by:  Papers (77)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (415 KB) |  | HTML iconHTML  

    Kernel methods are algorithms that, by replacing the inner product with an appropriate positive definite function, implicitly perform a nonlinear mapping of the input data into a high-dimensional feature space. In this paper, we present a kernel method for clustering inspired by the classical k-means algorithm in which each cluster is iteratively refined using a one-class support vector machine. Our method, which can be easily implemented, compares favorably with respect to popular clustering algorithms, like k-means, neural gas, and self-organizing maps, on a synthetic data set and three UCI real data benchmarks (IRIS data, Wisconsin breast cancer database, Spam database). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Glasses removal from facial image using recursive error compensation

    Publication Year: 2005 , Page(s): 805 - 811
    Cited by:  Papers (33)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1171 KB) |  | HTML iconHTML  

    In this paper, we propose a new method of removing glasses from a human frontal facial image. We first detect the regions occluded by the glasses and generate a natural looking facial image without glasses by recursive error compensation using PCA reconstruction. The resulting image has no trace of the glasses frame or of the reflection and shade caused by the glasses. The experimental results show that the proposed method provides an effective solution to the problem of glasses occlusion and we believe that this method can also be used to enhance the performance of face recognition systems. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A video database of moving faces and people

    Publication Year: 2005 , Page(s): 812 - 816
    Cited by:  Papers (34)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (504 KB) |  | HTML iconHTML  

    We describe a database of static images and video clips of human faces and people that is useful for testing algorithms for face and person recognition, head/eye tracking, and computer graphics modeling of natural human motions. For each person there are nine static "facial mug shots" and a series of video streams. The videos include a "moving facial mug shot," a facial speech clip, one or more dynamic facial expression clips, two gait videos, and a conversation video taken at a moderate distance from the camera. Complete data sets are available for 284 subjects and duplicate data sets, taken subsequent to the original set, are available for 229 subjects. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Clutter invariant ATR

    Publication Year: 2005 , Page(s): 817 - 821
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (353 KB) |  | HTML iconHTML  

    One of the central problems in automated target recognition is to accommodate the infinite variety of clutter in real military environments. The principle focus of our paper is on the construction of metric spaces where the metric measures the distance between objects of interest invariant to the infinite variety of clutter. Such metrics are formulated using second-order random field models. Our results indicate that this approach significantly improves detection/classification rates of targets in clutter. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Orientation in Manhattan: equiprojective classes and sequential estimation

    Publication Year: 2005 , Page(s): 822 - 827
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (606 KB) |  | HTML iconHTML  

    The problem of inferring 3D orientation of a camera from video sequences has been mostly addressed by first computing correspondences of image features. This intermediate step is now seen as the main bottleneck of those approaches. In this paper, we propose a new 3D orientation estimation method for urban (indoor and outdoor) environments, which avoids correspondences between frames. The scene property exploited by our method is that many edges are oriented along three orthogonal directions; this is the recently introduced Manhattan world (MW) assumption. The main contributions of this paper are: the definition of equivalence classes of equiprojective orientations, the introduction of a new small rotation model, formalizing the fact that the camera moves smoothly, and the decoupling of elevation and twist angle estimation from that of the compass angle. We build a probabilistic sequential orientation estimation method, based on an MW likelihood model, with the above-listed contributions allowing a drastic reduction of the search space for each orientation estimate. We demonstrate the performance of our method using real video sequences. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Effective Gaussian mixture learning for video background subtraction

    Publication Year: 2005 , Page(s): 827 - 832
    Cited by:  Papers (244)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (615 KB) |  | HTML iconHTML  

    Adaptive Gaussian mixtures have been used for modeling nonstationary temporal distributions of pixels in video surveillance applications. However, a common problem for this approach is balancing between model convergence speed and stability. This paper proposes an effective scheme to improve the convergence rate without compromising model stability. This is achieved by replacing the global, static retention factor with an adaptive learning rate calculated for each Gaussian at every frame. Significant improvements are shown on both synthetic and real video data. Incorporating this algorithm into a statistical framework for background subtraction leads to an improved segmentation performance compared to a standard method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • [Inside back cover]

    Publication Year: 2005 , Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (771 KB)  
    Freely Available from IEEE
  • [Back cover]

    Publication Year: 2005 , Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (151 KB)  
    Freely Available from IEEE

Aims & Scope

The IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) is published monthly. Its editorial board strives to present most important research results in areas within TPAMI's scope.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David A. Forsyth
University of Illinois