By Topic

Pattern Analysis and Machine Intelligence, IEEE Transactions on

Issue 10 • Date Oct 1999

Filter Results

Displaying Results 1 - 11 of 11
  • An HMM-based threshold model approach for gesture recognition

    Page(s): 961 - 973
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1492 KB)  

    A new method is developed using the hidden Markov model (HMM) based technique. To handle nongesture patterns, we introduce the concept of a threshold model that calculates the likelihood threshold of an input pattern and provides a confirmation mechanism for the provisionally matched gesture patterns. The threshold model is a weak model for all trained gestures in the sense that its likelihood is smaller than that of the dedicated gesture model for a given gesture. Consequently, the likelihood can be used as an adaptive threshold for selecting proper gesture model. It has, however, a large number of states and needs to be reduced because the threshold model is constructed by collecting the states of all gesture models in the system. To overcome this problem, the states with similar probability distributions are merged, utilizing the relative entropy measure. Experimental results show that the proposed method can successfully extract trained gestures from continuous hand motion with 93.14% reliability View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Postprocessing of recognized strings using nonstationary Markovian models

    Page(s): 990 - 999
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (224 KB)  

    This paper presents nonstationary Markovian models and their application to recognition of strings of tokens. Domain specific knowledge is brought to bear on the application of recognizing zip codes in the US mailstream by the use of postal directory files. These files provide a wealth of information on the delivery points (mailstops) corresponding to each zip code. This data feeds into the models as n-grams, statistics that are integrated with recognition scores of digit images. An especially interesting facet of the model is its ability to excite and inhibit certain positions in the n-grams leading to the familiar area of Markov random fields. We empirically illustrate the success of Markovian modeling in postprocessing applications of string recognition. We present the recognition accuracy of the different models on a set of 20000 zip codes. The performance is superior to the present system which ignores all contextual information and simply relies on the recognition scores of the digit recognizers View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Alignment using distributions of local geometric properties

    Page(s): 1031 - 1043
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1976 KB)  

    We describe a framework for aligning images without needing to establish explicit feature correspondences. We assume that the geometry between the two images can be adequately described by an affine transformation and develop a framework that uses the statistical distribution of geometric properties of image contours to estimate the relevant transformation parameters. The estimates obtained using the proposed method are robust to illumination conditions, sensor characteristics, etc., since image contours are relatively invariant to these changes. Moreover, the distributional nature of our method alleviates some of the common problems due to contour fragmentation, occlusion, clutter, etc. We provide empirical evidence of the accuracy and robustness of our algorithm. Finally, we demonstrate our method on both real and synthetic images, including multisensor image pairs View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Classifying facial actions

    Page(s): 974 - 989
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1320 KB)  

    The facial action coding system (FAGS) is an objective method for quantifying facial movement in terms of component actions. This paper explores and compares techniques for automatically recognizing facial actions in sequences of images. These techniques include: analysis of facial motion through estimation of optical flow; holistic spatial analysis, such as principal component analysis, independent component analysis, local feature analysis, and linear discriminant analysis; and methods based on the outputs of local filters, such as Gabor wavelet representations and local principal components. Performance of these systems is compared to naive and expert human subjects. Best performances were obtained using the Gabor wavelet representation and the independent component representation, both of which achieved 96 percent accuracy for classifying 12 facial actions of the upper and lower face. The results provide converging evidence for the importance of using local filters, high spatial frequencies, and statistical independence for classifying facial actions View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The translation sensitivity of wavelet-based registration

    Page(s): 1074 - 1081
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (368 KB)  

    This paper studies the effects of image translation on wavelet-based image registration. The main result is that the normalized correlation coefficients of low-pass Haar and Daubechies wavelet subbands are essentially insensitive to translations for features larger than twice the wavelet blocksize. The third-level low-pass subbands produce a correlation peak that varies with translation from 0.7 and 1.0 with an average in excess of 0.9. Translation sensitivity is limited to the high-pass subband and even this subband is potentially useful. The correlation peak for high-pass subbands derived from first and second-level low-pass subbands ranges from about 0.0 to 1.0 with an average of about 0.5 for Daubechies and 0.7 for Haar. We use a mathematical model to develop these results, and confirm them on real data View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Identification of fork points on the skeletons of handwritten Chinese characters

    Page(s): 1095 - 1100
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (704 KB)  

    This paper describes techniques for stroke extraction used in the recognition of handwritten Chinese characters. A new set of feature points is proposed for the analysis of skeleton images. Based on a geometrical graph, a novel criterion is proposed for the identification of fork points in a skeleton image which correspond to joint points in the original character image. Experimental results indicate that the proposed method correctly determines the fork points, and is effective in unifying the joint points View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Analysis of class separation and combination of class-dependent features for handwriting recognition

    Page(s): 1089 - 1094
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (352 KB)  

    In this paper, we propose a new approach to combine multiple features in handwriting recognition based on two ideas: feature selection-based combination and class dependent features. A nonparametric method is used for feature evaluation, and the first part of this paper is devoted to the evaluation of features in terms of their class separation and recognition capabilities. In the second part, multiple feature vectors are combined to produce a new feature vector. Based on the fact that a feature has different discriminating powers for different classes, a new scheme of selecting and combining class-dependent features is proposed. In this scheme, a class is considered to have its own optimal feature vector for discriminating itself from the other classes. Using an architecture of modular neural networks as the classifier, a series of experiments were conducted on unconstrained handwritten numerals. The results indicate that the selected features are effective in separating pattern classes and the new feature vector derived from a combination of two types of such features further improves the recognition rate View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The RGFF representational model: a system for the automatically learned partitioning of “visual patterns” in digital images

    Page(s): 1044 - 1073
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (8144 KB)  

    This paper describes a system for the automatically learned partitioning of visual patterns in 2D images, based on sophisticated band-pass filtering with fixed scale and orientation sensitivity. The visual patterns are defined as the features which have the highest degree of alignment in the statistical structure across different frequency bands. The analysis reorganizes the image according to an invariance constraint in statistical structure and consists of three stages: pre-attentive stage, integration stage, and learning stage. The first stage takes the input image and performs filtering with log-Gabor filters. Based on their responses, activated filters which are selectively sensitive to patterns in the image are short listed. In the integration stage, common grounds between several activated sensors are explored. The filtered responses are analyzed through a family of statistics. For any given two activated filters, a distance between them is derived via distances between their statistics. The third stage performs cluster partitioning for learning the subspace of log-Gabor filters needed to partition the image data. The clustering is based on a dissimilarity measure intended to highlight scale and orientation invariance of the responses. The technique is illustrated on real and simulated data sets. Finally, this paper presents a computational visual distinctness measure computed from the image representational model based on visual patterns. Experiments are performed to investigate its relation to distinctness as measured by human observers View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A solution to the next best view problem for automated surface acquisition

    Page(s): 1016 - 1030
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (860 KB)  

    A solution to the “next best view” (NBV) problem for automated surface acquisition is presented. The NBV problem is to determine which areas of a scanner's viewing volume need to be scanned to sample all of the visible surfaces of an a priori unknown object and where to position/control the scanner to sample them. A method for determining the unscanned areas of the viewing volume is presented. In addition, a novel representation, positional space, is presented which facilitates a solution to the NBV problem by representing what must be and what can be scanned in a single data structure. The number of costly computations needed to determine if an area of the viewing volume would be occluded from some scanning position is decoupled from the number of positions considered for the NBV, thus reducing the computational cost of choosing one. An automated surface acquisition systems designed to scan all visible surfaces of an a priori unknown object is demonstrated on real objects View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Line-based face recognition under varying pose

    Page(s): 1081 - 1088
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (648 KB)  

    Much research in human face recognition involves fronto-parallel face images, constrained rotations in and out of the plane, and operates under strict imaging conditions such as controlled illumination and limited facial expressions. Face recognition using multiple views in the viewing sphere is a more difficult task since face rotations out of the imaging plane can introduce occlusion of facial structures. In this paper, we propose a novel image-based face recognition algorithm that uses a set of random rectilinear line segments of 2D face image views as the underlying image representation, together with a nearest-neighbor classifier as the line matching scheme. The combination of 1D line segments exploits the inherent coherence in one or more 2D face image views in the viewing sphere. The algorithm achieves high generalization recognition rates for rotations both in and out of the plane, is robust to scaling, and is computationally efficient. Results show that the classification accuracy of the algorithm is superior compared with benchmark algorithms and is able to recognize test views in quasi-real-time View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Indexing without invariants in 3D object recognition

    Page(s): 1000 - 1015
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1376 KB)  

    We present a method of indexing 3D objects from single 2D images. The method does not rely on invariant features. This allows a richer set of shape information to be used in the recognition process. We also suggest the kd-tree as an alternative indexing data structure to the standard hash table. This makes hypothesis recovery more efficient in high-dimensional spaces, which are necessary to achieve specificity in large model databases. Search efficiency is maintained in these regimes by the use of best-bin first search. Neighbors recovered from the index are used to generate probability estimates, local within the feature space, which are then used to rank hypotheses for verification. On average, the ranking process greatly reduces the number of verifications required. Our approach is general in that it can be applied to any real-valued feature vector. In addition, it is straightforward to add to our index information from real images regarding the true probability distributions of the feature groupings used for indexing View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

The IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) is published monthly. Its editorial board strives to present most important research results in areas within TPAMI's scope.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David A. Forsyth
University of Illinois